linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-25 10:56:11 +07:00

Author	SHA1	Message	Date
Nicolas Saenz Julienne	4713787c1f	clk: bcm: dvp: Add MODULE_DEVICE_TABLE() [ Upstream commit be439cc4c404f646a8ba090fa786d53c10926b12 ] Add MODULE_DEVICE_TABLE() so as to be able to use the driver as a module. More precisely, for the driver to be loaded automatically at boot. Fixes: `1bc9597271` ("clk: bcm: Add BCM2711 DVP driver") Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Link: https://lore.kernel.org/r/20201202103518.21889-1-nsaenzjulienne@suse.de Reviewed-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:54:00 +01:00
Soheil Hassas Yeganeh	1afb979cdc	epoll: check for events when removing a timed out thread from the wait queue [ Upstream commit 289caf5d8f6c61c6d2b7fd752a7f483cd153f182 ] Patch series "simplify ep_poll". This patch series is a followup based on the suggestions and feedback by Linus: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com The first patch in the series is a fix for the epoll race in presence of timeouts, so that it can be cleanly backported to all affected stable kernels. The rest of the patch series simplify the ep_poll() implementation. Some of these simplifications result in minor performance enhancements as well. We have kept these changes under self tests and internal benchmarks for a few days, and there are minor (1-2%) performance enhancements as a result. This patch (of 8): After `abc610e01c` ("fs/epoll: avoid barrier after an epoll_wait(2) timeout"), we break out of the ep_poll loop upon timeout, without checking whether there is any new events available. Prior to that patch-series we always called ep_events_available() after exiting the loop. This can cause races and missed wakeups. For example, consider the following scenario reported by Guantao Liu: Suppose we have an eventfd added using EPOLLET to an epollfd. Thread 1: Sleeps for just below 5ms and then writes to an eventfd. Thread 2: Calls epoll_wait with a timeout of 5 ms. If it sees an event of the eventfd, it will write back on that fd. Thread 3: Calls epoll_wait with a negative timeout. Prior to `abc610e01c`, it is guaranteed that Thread 3 will wake up either by Thread 1 or Thread 2. After `abc610e01c`, Thread 3 can be blocked indefinitely if Thread 2 sees a timeout right before the write to the eventfd by Thread 1. Thread 2 will be woken up from schedule_hrtimeout_range and, with evail 0, it will not call ep_send_events(). To fix this issue: 1) Simplify the timed_out case as suggested by Linus. 2) while holding the lock, recheck whether the thread was woken up after its time out has reached. Note that (2) is different from Linus' original suggestion: It do not set "eavail = ep_events_available(ep)" to avoid unnecessary contention (when there are too many timed-out threads and a small number of events), as well as races mentioned in the discussion thread. This is the first patch in the series so that the backport to stable releases is straightforward. Link: https://lkml.kernel.org/r/20201106231635.3528496-1-soheil.kdev@gmail.com Link: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com Link: https://lkml.kernel.org/r/20201106231635.3528496-2-soheil.kdev@gmail.com Fixes: `abc610e01c` ("fs/epoll: avoid barrier after an epoll_wait(2) timeout") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Guantao Liu <guantaol@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Guantao Liu <guantaol@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:54:00 +01:00
Zhang Changzhong	b7bc097f29	vhost scsi: fix error return code in vhost_scsi_set_endpoint() [ Upstream commit 2e1139d613c7fb0956e82f72a8281c0a475ad4f8 ] Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `25b98b64e2` ("vhost scsi: alloc cmds per vq instead of session") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1607071411-33484-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:54:00 +01:00
Dan Carpenter	dbdfefc71a	virtio_ring: Fix two use after free bugs [ Upstream commit e152d8af4220a05c9797591609151d404866beaa ] The "vq" struct is added to the "vdev->vqs" list prematurely. If we encounter an error later in the function then the "vq" is freed, but since it is still on the list that could lead to a use after free bug. Fixes: `cbeedb72b9` ("virtio_ring: allocate desc state for split ring separately") Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de> Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8pGaG/zkI3jk8mk@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:54:00 +01:00
Dan Carpenter	78b35fd94c	virtio_net: Fix error code in probe() [ Upstream commit 411ea23a76526e6efed0b601abb603d3c981b333 ] Set a negative error code intead of returning success if the MTU has been changed to something invalid. Fixes: `fe36cbe067` ("virtio_net: clear MTU when out of range") Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de> Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8pGVJSeeCdII1Ys@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:54:00 +01:00
Dan Carpenter	bfffbd34bb	virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed() [ Upstream commit ae93d8ea0fa701e84ab9df0db9fb60ec6c80d7b8 ] There is a copy and paste bug in the error handling of this code and it uses "ring_dma_addr" three times instead of "device_event_dma_addr" and "driver_event_dma_addr". Fixes: `1ce9e6055f` (" virtio_ring: introduce packed ring support") Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de> Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8pGRJlEzyn+04u2@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:54:00 +01:00
Eli Cohen	069fedf3fb	vdpa/mlx5: Use write memory barrier after updating CQ index [ Upstream commit 83ef73b27eb2363f44faf9c3ee28a3fe752cfd15 ] Make sure to put dma write memory barrier after updating CQ consumer index so the hardware knows that there are available CQE slots in the queue. Failure to do this can cause the update of the RX doorbell record to get updated before the CQ consumer index resulting in CQ overrun. Fixes: `1a86b377aa` ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20201209140004.15892-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:54:00 +01:00
Simon Horman	8ae3143000	nfp: move indirect block cleanup to flower app stop callback [ Upstream commit 5b33afee93a1e7665a5ffae027fc66f9376f4ea7 ] The indirect block cleanup may cause control messages to be sent if offloaded flows are present. However, by the time the flower app cleanup callback is called txbufs are no longer available and attempts to send control messages result in a NULL-pointer dereference in nfp_ctrl_tx_one(). This problem may be resolved by moving the indirect block cleanup to the stop callback, where txbufs are still available. As suggested by Jakub Kicinski and Louis Peens. Fixes: `a1db217861` ("net: flow_offload: fix flow_indr_dev_unregister path") Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Link: https://lore.kernel.org/r/20201216145701.30005-1-simon.horman@netronome.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:54:00 +01:00
Dan Carpenter	466587ce57	qlcnic: Fix error code in probe [ Upstream commit 0d52848632a357948028eab67ff9b7cc0c12a0fb ] Return -EINVAL if we can't find the correct device. Currently it returns success. Fixes: `13159183ec` ("qlcnic: 83xx base driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X9nHbMqEyI/xPfGd@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:59 +01:00
Zheng Zengkai	98c9b3aeff	perf record: Fix memory leak when using '--user-regs=?' to list registers [ Upstream commit 2eb5dd418034ecea2f7031e3d33f2991a878b148 ] When using 'perf record's option '-I' or '--user-regs=' along with argument '?' to list available register names, memory of variable 'os' allocated by strdup() needs to be released before __parse_regs() returns, otherwise memory leak will occur. Fixes: `bcc84ec65a` ("perf record: Add ability to name registers to record") Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20200703093344.189450-1-zhengzengkai@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:59 +01:00
Jiri Olsa	03cbbd5648	tools build: Add missing libcap to test-all.bin target [ Upstream commit 09d59c2f3465fb01e65a0c96698697b026ea8e79 ] We're missing -lcap in test-all.bin target, so in case it's the only library missing (if more are missing test-all.bin fails anyway), we will falsely claim that we detected it and fail build, like: $ make ... Auto-detecting system features: ... dwarf: [ on ] ... dwarf_getlocations: [ on ] ... glibc: [ on ] ... libbfd: [ on ] ... libbfd-buildid: [ on ] ... libcap: [ on ] ... libelf: [ on ] ... libnuma: [ on ] ... numa_num_possible_cpus: [ on ] ... libperl: [ on ] ... libpython: [ on ] ... libcrypto: [ on ] ... libunwind: [ on ] ... libdw-dwarf-unwind: [ on ] ... zlib: [ on ] ... lzma: [ on ] ... get_cpuid: [ on ] ... bpf: [ on ] ... libaio: [ on ] ... libzstd: [ on ] ... disassembler-four-args: [ on ] ... CC builtin-ftrace.o In file included from builtin-ftrace.c:29: util/cap.h:11:10: fatal error: sys/capability.h: No such file or directory 11 \| #include <sys/capability.h> \| ^~~~~~~~~~~~~~~~~~ compilation terminated. Fixes: `74d5f3d06f` ("tools build: Add capability-related feature detection") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201203230836.3751981-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:59 +01:00
Pavel Begunkov	a773dea1a9	io_uring: cancel only requests of current task [ Upstream commit df9923f96717d0aebb0a73adbcf6285fa79e38cb ] io_uring_cancel_files() cancels all request that match files regardless of task. There is no real need in that, cancel only requests of the specified task. That also handles SQPOLL case as it already changes task to it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:59 +01:00
Thierry Reding	4b14874409	pwm: sun4i: Remove erroneous else branch [ Upstream commit 6eefb79d6f5bc4086bd02c76f1072dd4a8d9d9f6 ] Commit `d3817a6470` ("pwm: sun4i: Remove redundant needs_delay") changed the logic of an else branch so that the PWM_EN and PWM_CLK_GATING bits are now cleared if the PWM is to be disabled, whereas previously the condition was always false, and hence the branch never got executed. This code is reported causing backlight issues on boards based on the Allwinner A20 SoC. Fix this by removing the else branch, which restores the behaviour prior to the offending commit. Note that the PWM_EN and PWM_CLK_GATING bits still get cleared later in sun4i_pwm_apply() if the PWM is to be disabled. Fixes: `d3817a6470` ("pwm: sun4i: Remove redundant needs_delay") Reported-by: Taras Galchenko <tpgalchenko@gmail.com> Suggested-by: Taras Galchenko <tpgalchenko@gmail.com> Tested-by: Taras Galchenko <tpgalchenko@gmail.com> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:59 +01:00
Uwe Kleine-König	7c4544a216	pwm: imx27: Fix overflow for bigger periods [ Upstream commit 1ce65396e6b2386b4fd54f87beff0647a772e1cd ] The second parameter of do_div is an u32 and NSEC_PER_SEC * prescale overflows this for bigger periods. Assuming the usual pwm input clk rate of 66 MHz this happens starting at requested period > 606060 ns. Splitting the division into two operations doesn't loose any precision. It doesn't need to be feared that c / NSEC_PER_SEC doesn't fit into the unsigned long variable "duty_cycles" because in this case the assignment above to period_cycles would already have been overflowing as period >= duty_cycle and then the calculation is moot anyhow. Fixes: `aef1a3799b` ("pwm: imx27: Fix rounding behavior") Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org> Tested-by: Johannes Pointner <johannes.pointner@br-automation.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:59 +01:00
Lokesh Vutla	2cacf60c92	pwm: lp3943: Dynamically allocate PWM chip base [ Upstream commit 1f0f1e80fdd3aa9631f6c22cda4f8550cfcfcc3e ] When there are other PWM controllers enabled along with pwm-lp3943, pwm-lp3942 is failing to probe with -EEXIST error. This is because other PWM controllers are probed first and assigned PWM base 0 and pwm-lp3943 is requesting for 0 again. In order to avoid this, assign the chip base with -1, so that it is dynamically allocated. Fixes: `af66b3c093` ("pwm: Add LP3943 PWM driver") Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Reviewed-by: Uwe Kleine-König <u.kleine-könig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:59 +01:00
Uwe Kleine-König	00fb97e2d7	pwm: zx: Add missing cleanup in error path [ Upstream commit 269effd03f6142df4c74814cfdd5f0b041b30bf9 ] zx_pwm_probe() called clk_prepare_enable() before; this must be undone in the error path. Fixes: `4836193c43` ("pwm: Add ZTE ZX PWM device driver") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:58 +01:00
Zhang Qilong	91877b1fb0	clk: ti: Fix memleak in ti_fapll_synth_setup [ Upstream commit 8c6239f6e95f583bb763d0228e02d4dd0fb3d492 ] If clk_register fails, we should goto free branch before function returns to prevent memleak. Fixes: `163152cbbe` ("clk: ti: Add support for FAPLL on dm816x") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20201113131623.2098222-1-zhangqilong3@huawei.com Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:58 +01:00
Arnd Bergmann	43fc2d3a4a	watchdog: coh901327: add COMMON_CLK dependency [ Upstream commit 36c47df85ee8e1f8a35366ac11324f8875de00eb ] clang produces a build failure in configurations without COMMON_CLK when a timeout calculation goes wrong: arm-linux-gnueabi-ld: drivers/watchdog/coh901327_wdt.o: in function `coh901327_enable': coh901327_wdt.c:(.text+0x50): undefined reference to `__bad_udelay' Add a Kconfig dependency to only do build testing when COMMON_CLK is enabled. Fixes: `da2a68b3eb` ("watchdog: Enable COMPILE_TEST where possible") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20201203223358.1269372-1-arnd@kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:58 +01:00
Manivannan Sadhasivam	45867d2ee4	watchdog: qcom: Avoid context switch in restart handler [ Upstream commit 7948fab26bcc468aa2a76462f441291b5fb0d5c7 ] The use of msleep() in the restart handler will cause scheduler to induce a context switch which is not desirable. This generates below warning on SDX55 when WDT is the only available restart source: [ 39.800188] reboot: Restarting system [ 39.804115] ------------[ cut here ]------------ [ 39.807855] WARNING: CPU: 0 PID: 678 at kernel/rcu/tree_plugin.h:297 rcu_note_context_switch+0x190/0x764 [ 39.812538] Modules linked in: [ 39.821954] CPU: 0 PID: 678 Comm: reboot Not tainted 5.10.0-rc1-00063-g33a9990d1d66-dirty #47 [ 39.824854] Hardware name: Generic DT based system [ 39.833470] [<c0310fbc>] (unwind_backtrace) from [<c030c544>] (show_stack+0x10/0x14) [ 39.838154] [<c030c544>] (show_stack) from [<c0c218f0>] (dump_stack+0x8c/0xa0) [ 39.846049] [<c0c218f0>] (dump_stack) from [<c0322f80>] (__warn+0xd8/0xf0) [ 39.853058] [<c0322f80>] (__warn) from [<c0c1dc08>] (warn_slowpath_fmt+0x64/0xc8) [ 39.859925] [<c0c1dc08>] (warn_slowpath_fmt) from [<c038b6f4>] (rcu_note_context_switch+0x190/0x764) [ 39.867503] [<c038b6f4>] (rcu_note_context_switch) from [<c0c2aa3c>] (__schedule+0x84/0x640) [ 39.876685] [<c0c2aa3c>] (__schedule) from [<c0c2b050>] (schedule+0x58/0x10c) [ 39.885095] [<c0c2b050>] (schedule) from [<c0c2eed0>] (schedule_timeout+0x1e8/0x3d4) [ 39.892135] [<c0c2eed0>] (schedule_timeout) from [<c039ad40>] (msleep+0x2c/0x38) [ 39.899947] [<c039ad40>] (msleep) from [<c0a59d0c>] (qcom_wdt_restart+0xc4/0xcc) [ 39.907319] [<c0a59d0c>] (qcom_wdt_restart) from [<c0a58290>] (watchdog_restart_notifier+0x18/0x28) [ 39.914715] [<c0a58290>] (watchdog_restart_notifier) from [<c03468e0>] (atomic_notifier_call_chain+0x60/0x84) [ 39.923487] [<c03468e0>] (atomic_notifier_call_chain) from [<c030ae64>] (machine_restart+0x78/0x7c) [ 39.933551] [<c030ae64>] (machine_restart) from [<c0348048>] (__do_sys_reboot+0xdc/0x1e0) [ 39.942397] [<c0348048>] (__do_sys_reboot) from [<c0300060>] (ret_fast_syscall+0x0/0x54) [ 39.950721] Exception stack(0xc3e0bfa8 to 0xc3e0bff0) [ 39.958855] bfa0: 0001221c bed2fe24 fee1dead 28121969 01234567 00000000 [ 39.963832] bfc0: 0001221c bed2fe24 00000003 00000058 000225e0 00000000 00000000 00000000 [ 39.971985] bfe0: b6e62560 bed2fc84 00010fd8 b6e62580 [ 39.980124] ---[ end trace 3f578288bad866e4 ]--- Hence, replace msleep() with mdelay() to fix this issue. Fixes: `05e487d905` ("watchdog: qcom: register a restart notifier") Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20201207060005.21293-1-manivannan.sadhasivam@linaro.org Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:58 +01:00
Michael Ellerman	a3c1680828	powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug [ Upstream commit c1bea0a840ac75dca19bc6aa05575a33eb9fd058 ] Currently pmac32_defconfig with SMP=y doesn't build: arch/powerpc/platforms/powermac/smp.c: error: implicit declaration of function 'cleanup_cpu_mmu_context' It would be nice for consistency if all platforms clear mm_cpumask and flush TLBs on unplug, but the TLB invalidation bug described in commit `01b0f0eae0` ("powerpc/64s: Trim offlined CPUs from mm_cpumasks") only applies to 64s and for now we only have the TLB flush code for that platform. So just add an empty version for 32-bit Book3S. Fixes: `01b0f0eae0` ("powerpc/64s: Trim offlined CPUs from mm_cpumasks") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change log based on comments from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:58 +01:00
Zhang Qilong	0572a4aa74	libnvdimm/label: Return -ENXIO for no slot in __blk_label_update [ Upstream commit 4c46764733c85b82c07e9559b39da4d00a7dd659 ] Forget to set error code when nd_label_alloc_slot failed, and we add it to avoid overwritten error code. Fixes: `0ba1c63489` ("libnvdimm: write blk label set") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20201205115056.2076523-1-zhangqilong3@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:58 +01:00
Tobias Klauser	0eecef0fec	devlink: use _BITUL() macro instead of BIT() in the UAPI header [ Upstream commit 75f4d4544db9fa34e1f04174f27d9f8a387be37d ] The BIT() macro is not available for the UAPI headers. Moreover, it can be defined differently in user space headers. Thus, replace its usage with the _BITUL() macro which is already used in other macro definitions in <linux/devlink.h>. Fixes: `dc64cc7c63` ("devlink: Add devlink reload limit option") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Link: https://lore.kernel.org/r/20201215102531.16958-1-tklauser@distanz.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:58 +01:00
Vincent Stehlé	027112b267	net: korina: fix return value [ Upstream commit 7eb000bdbe7c7da811ef51942b356f6e819b13ba ] The ndo_start_xmit() method must not attempt to free the skb to transmit when returning NETDEV_TX_BUSY. Therefore, make sure the korina_send_packet() function returns NETDEV_TX_OK when it frees a packet. Fixes: `ef11291bcd` ("Add support the Korina (IDT RC32434) Ethernet MAC") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201214220952.19935-1-vincent.stehle@laposte.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:57 +01:00
Trond Myklebust	de16a86c9d	NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read() [ Upstream commit 52104f274e2d7f134d34bab11cada8913d4544e2 ] Don't bump the index twice. Fixes: `563c53e73b` ("NFS: Fix flexfiles read failover") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:57 +01:00
Jack Wang	1b75aea3e3	block/rnbd-clt: Fix possible memleak [ Upstream commit 46067844efdb8275ade705923120fc5391543b53 ] In error case, we do not free the memory for blk_symlink_name. Do it by free the memory in error case, and set to NULL afterwards. Also fix the condition in rnbd_clt_remove_dev_symlink. Fixes: 64e8a6ece1a5 ("block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name") Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:57 +01:00
Md Haris Iqbal	996ce53a2a	block/rnbd-clt: Get rid of warning regarding size argument in strlcpy [ Upstream commit e7508d48565060af5d89f10cb83c9359c8ae1310 ] The kernel test robot triggerred the following warning, >> drivers/block/rnbd/rnbd-clt.c:1397:42: warning: size argument in 'strlcpy' call appears to be size of the source; expected the size of the destination [-Wstrlcpy-strlcat-size] strlcpy(dev->pathname, pathname, strlen(pathname) + 1); ~~~~~~~^~~~~~~~~~~~~ To get rid of the above warning, use a kstrdup as Bart suggested. Fixes: 64e8a6ece1a5 ("block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:57 +01:00
Christophe JAILLET	e50eea719f	net: allwinner: Fix some resources leak in the error handling path of the probe and in the remove function [ Upstream commit 322e53d1e2529ae9d501f5e0f20604a79b873aef ] 'irq_of_parse_and_map()' should be balanced by a corresponding 'irq_dispose_mapping()' call. Otherwise, there is some resources leaks. Add such a call in the error handling path of the probe function and in the remove function. Fixes: `492205050d` ("net: Add EMAC ethernet driver found on Allwinner A10 SoC's") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20201214202117.146293-1-christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:57 +01:00
Christophe JAILLET	8f995afae9	net: mscc: ocelot: Fix a resource leak in the error handling path of the probe function [ Upstream commit f87675b836b324d270fd52f1f5e6d6bb9f4bd1d5 ] In case of error after calling 'ocelot_init()', it must be undone by a corresponding 'ocelot_deinit()' call, as already done in the remove function. Fixes: `a556c76adc` ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201213114838.126922-1-christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:57 +01:00
Christophe JAILLET	1e7524c981	net: bcmgenet: Fix a resource leak in an error handling path in the probe functin [ Upstream commit 4375ada01963d1ebf733d60d1bb6e5db401e1ac6 ] If the 'register_netdev()' call fails, we must undo a previous 'bcmgenet_mii_init()' call. Fixes: `1c1008c793` ("net: bcmgenet: add main driver file") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201212182005.120437-1-christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:57 +01:00
Ioana Ciornei	5c0109f779	dpaa2-eth: fix the size of the mapped SGT buffer [ Upstream commit 54a57d1c449275ee727154ac106ec1accae012e3 ] This patch fixes an error condition triggered when the code path which transmits a S/G frame descriptor when the skb's headroom is not enough for DPAA2's needs. We are greated with a splat like the one below when a SGT structure is recycled and that is because even though a dma_unmap is performed on the Tx confirmation path, the unmap is not done with the proper size. [ 714.464927] WARNING: CPU: 13 PID: 0 at drivers/iommu/io-pgtable-arm.c:281 __arm_lpae_map+0x2d4/0x30c (...) [ 714.465343] Call trace: [ 714.465348] __arm_lpae_map+0x2d4/0x30c [ 714.465353] __arm_lpae_map+0x114/0x30c [ 714.465357] __arm_lpae_map+0x114/0x30c [ 714.465362] __arm_lpae_map+0x114/0x30c [ 714.465366] arm_lpae_map+0xf4/0x180 [ 714.465373] arm_smmu_map+0x4c/0xc0 [ 714.465379] __iommu_map+0x100/0x2bc [ 714.465385] iommu_map_atomic+0x20/0x30 [ 714.465391] __iommu_dma_map+0xb0/0x110 [ 714.465397] iommu_dma_map_page+0xb8/0x120 [ 714.465404] dma_map_page_attrs+0x1a8/0x210 [ 714.465413] __dpaa2_eth_tx+0x384/0xbd0 [fsl_dpaa2_eth] [ 714.465421] dpaa2_eth_tx+0x84/0x134 [fsl_dpaa2_eth] [ 714.465427] dev_hard_start_xmit+0x10c/0x2b0 [ 714.465433] sch_direct_xmit+0x1a0/0x550 (...) The dpaa2-eth driver uses an area of software annotations to transmit necessary information from the Tx path to the Tx confirmation one. This SWA structure has a different layout for each kind of frame that we are dealing with: linear, S/G or XDP. The commit referenced was incorrectly setting up the 'sgt_size' field for the S/G type of SWA even though we are dealing with a linear skb here. Fixes: `d70446ee1f` ("dpaa2-eth: send a scatter-gather FD instead of realloc-ing") Reported-by: Daniel Thompson <daniel.thompson@linaro.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20201211171607.108034-1-ciorneiioana@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:56 +01:00
Oleksij Rempel	d50170ac30	net: dsa: qca: ar9331: fix sleeping function called from invalid context bug [ Upstream commit 3e47495fc4de4122598dd51ae8527b09b8209646 ] With lockdep enabled, we will get following warning: ar9331_switch ethernet.1:10 lan0 (uninitialized): PHY [!ahb!ethernet@1a000000!mdio!switch@10:00] driver [Qualcomm Atheros AR9331 built-in PHY] (irq=13) BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 18, name: kworker/0:1 INFO: lockdep is turned off. irq event stamp: 602 hardirqs last enabled at (601): [<8073fde0>] _raw_spin_unlock_irq+0x3c/0x80 hardirqs last disabled at (602): [<8073a4f4>] __schedule+0x184/0x800 softirqs last enabled at (0): [<80080f60>] copy_process+0x578/0x14c8 softirqs last disabled at (0): [<00000000>] 0x0 CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 5.10.0-rc3-ar9331-00734-g7d644991df0c #31 Workqueue: events deferred_probe_work_func Stack : 80980000 80980000 8089ef70 80890000 804b5414 80980000 00000002 80b53728 00000000 800d1268 804b5414 ffffffde 00000017 800afe08 81943860 0f5bfc32 00000000 00000000 8089ef70 819436c0 ffffffea 00000000 00000000 00000000 8194390c 808e353c 0000000f 66657272 80980000 00000000 00000000 80890000 804b5414 80980000 00000002 80b53728 00000000 00000000 00000000 80d40000 ... Call Trace: [<80069ce0>] show_stack+0x9c/0x140 [<800afe08>] ___might_sleep+0x220/0x244 [<8073bfb0>] __mutex_lock+0x70/0x374 [<8073c2e0>] mutex_lock_nested+0x2c/0x38 [<804b5414>] regmap_update_bits_base+0x38/0x8c [<804ee584>] regmap_update_bits+0x1c/0x28 [<804ee714>] ar9331_sw_unmask_irq+0x34/0x60 [<800d91f0>] unmask_irq+0x48/0x70 [<800d93d4>] irq_startup+0x114/0x11c [<800d65b4>] __setup_irq+0x4f4/0x6d0 [<800d68a0>] request_threaded_irq+0x110/0x190 [<804e3ef0>] phy_request_interrupt+0x4c/0xe4 [<804df508>] phylink_bringup_phy+0x2c0/0x37c [<804df7bc>] phylink_of_phy_connect+0x118/0x130 [<806c1a64>] dsa_slave_create+0x3d0/0x578 [<806bc4ec>] dsa_register_switch+0x934/0xa20 [<804eef98>] ar9331_sw_probe+0x34c/0x364 [<804eb48c>] mdio_probe+0x44/0x70 [<8049e3b4>] really_probe+0x30c/0x4f4 [<8049ea10>] driver_probe_device+0x264/0x26c [<8049bc10>] bus_for_each_drv+0xb4/0xd8 [<8049e684>] __device_attach+0xe8/0x18c [<8049ce58>] bus_probe_device+0x48/0xc4 [<8049db70>] deferred_probe_work_func+0xdc/0xf8 [<8009ff64>] process_one_work+0x2e4/0x4a0 [<800a0770>] worker_thread+0x2a8/0x354 [<800a774c>] kthread+0x16c/0x174 [<8006306c>] ret_from_kernel_thread+0x14/0x1c ar9331_switch ethernet.1:10 lan1 (uninitialized): PHY [!ahb!ethernet@1a000000!mdio!switch@10:02] driver [Qualcomm Atheros AR9331 built-in PHY] (irq=13) DSA: tree 0 setup To fix it, it is better to move access to MDIO register to the .irq_bus_sync_unlock call back. Fixes: `ec6698c272` ("net: dsa: add support for Atheros AR9331 built-in switch") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20201211110317.17061-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:56 +01:00
Björn Töpel	bc79bf6c58	i40e, xsk: clear the status bits for the next_to_use descriptor [ Upstream commit 64050b5b8706d304ba647591b06e1eddc55e8bd9 ] On the Rx side, the next_to_use index points to the next item in the HW ring to be refilled/allocated, and next_to_clean points to the next item to potentially be processed. When the HW Rx ring is fully refilled, i.e. no packets has been processed, the next_to_use will be next_to_clean - 1. When the ring is fully processed next_to_clean will be equal to next_to_use. The latter case is where a bug is triggered. If the next_to_use bits are not cleared, and the "fully processed" state is entered, a stale descriptor can be processed. The skb-path correctly clear the status bit for the next_to_use descriptor, but the AF_XDP zero-copy path did not do that. This change adds the status bits clearing of the next_to_use descriptor. Fixes: `3b4f0b66c2` ("i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:56 +01:00
Björn Töpel	0c3d87fa50	ice, xsk: clear the status bits for the next_to_use descriptor [ Upstream commit 8d14768a7972b92c73259f0c9c45b969d85e3a60 ] On the Rx side, the next_to_use index points to the next item in the HW ring to be refilled/allocated, and next_to_clean points to the next item to potentially be processed. When the HW Rx ring is fully refilled, i.e. no packets has been processed, the next_to_use will be next_to_clean - 1. When the ring is fully processed next_to_clean will be equal to next_to_use. The latter case is where a bug is triggered. If the next_to_use bits are not cleared, and the "fully processed" state is entered, a stale descriptor can be processed. The skb-path correctly clear the status bit for the next_to_use descriptor, but the AF_XDP zero-copy path did not do that. This change adds the status bits clearing of the next_to_use descriptor. Fixes: `2d4238f556` ("ice: Add support for AF_XDP") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:56 +01:00
Sven Van Asbroeck	f290c9bed1	lan743x: fix rx_napi_poll/interrupt ping-pong [ Upstream commit 57030a0b620f735bf557696e5ceb9f32c2b3bb8f ] Even if there is more rx data waiting on the chip, the rx napi poll fn will never run more than once - it will always read a few buffers, then bail out and re-arm interrupts. Which results in ping-pong between napi and interrupt. This defeats the purpose of napi, and is bad for performance. Fix by making the rx napi poll behave identically to other ethernet drivers: 1. initialize rx napi polling with an arbitrary budget (64). 2. in the polling fn, return full weight if rx queue is not depleted, this tells the napi core to "keep polling". 3. update the rx tail ("ring the doorbell") once for every 8 processed rx ring buffers. Thanks to Jakub Kicinski, Eric Dumazet and Andrew Lunn for their expert opinions and suggestions. Tested with 20 seconds of full bandwidth receive (iperf3): rx irqs softirqs(NET_RX) ----------------------------- before 23827 33620 after 129 4081 Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430 Fixes: `23f0703c12` ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com> Link: https://lore.kernel.org/r/20201215161954.5950-1-TheSven73@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:56 +01:00
Heiko Carstens	1856405862	s390/test_unwind: fix CALL_ON_STACK tests [ Upstream commit f22b9c219a798e1bf11110a3d2733d883e6da059 ] The CALL_ON_STACK tests use the no_dat stack to switch to a different stack for unwinding tests. If an interrupt or machine check happens while using that stack, and previously being on the async stack, the interrupt / machine check entry code (SWITCH_ASYNC) will assume that the previous context did not use the async stack and happily use the async stack again. This will lead to stack corruption of the previous context. To solve this disable both interrupts and machine checks before switching to the no_dat stack. Fixes: `7868249fbb` ("s390/test_unwind: add CALL_ON_STACK tests") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:56 +01:00
Dwaipayan Ray	0633094ec7	checkpatch: fix unescaped left brace [ Upstream commit 03f4935135b9efeb780b970ba023c201f81cf4e6 ] There is an unescaped left brace in a regex in OPEN_BRACE check. This throws a runtime error when checkpatch is run with --fix flag and the OPEN_BRACE check is executed. Fix it by escaping the left brace. Link: https://lkml.kernel.org/r/20201115202928.81955-1-dwaipayanray1@gmail.com Fixes: `8d1824780f` ("checkpatch: add --fix option for a couple OPEN_BRACE misuses") Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:56 +01:00
Alexey Dobriyan	b202ac9c73	proc: fix lookup in /proc/net subdirectories after setns(2) [ Upstream commit c6c75deda81344c3a95d1d1f606d5cee109e5d54 ] Commit `1fde6f21d9` ("proc: fix /proc/net/* after setns(2)") only forced revalidation of regular files under /proc/net/ However, /proc/net/ is unusual in the sense of /proc/net/foo handlers take netns pointer from parent directory which is old netns. Steps to reproduce: (void)open("/proc/net/sctp/snmp", O_RDONLY); unshare(CLONE_NEWNET); int fd = open("/proc/net/sctp/snmp", O_RDONLY); read(fd, &c, 1); Read will read wrong data from original netns. Patch forces lookup on every directory under /proc/net . Link: https://lkml.kernel.org/r/20201205160916.GA109739@localhost.localdomain Fixes: `1da4d377f9` ("proc: revalidate misc dentries") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:56 +01:00
Johannes Weiner	bd3f4b6fd9	mm: don't wake kswapd prematurely when watermark boosting is disabled [ Upstream commit 597c892038e08098b17ccfe65afd9677e6979800 ] On 2-node NUMA hosts we see bursts of kswapd reclaim and subsequent pressure spikes and stalls from cache refaults while there is plenty of free memory in the system. Usually, kswapd is woken up when all eligible nodes in an allocation are full. But the code related to watermark boosting can wake kswapd on one full node while the other one is mostly empty. This may be justified to fight fragmentation, but is currently unconditionally done whether watermark boosting is occurring or not. In our case, many of our workloads' throughput scales with available memory, and pure utilization is a more tangible concern than trends around longer-term fragmentation. As a result we generally disable watermark boosting. Wake kswapd only woken when watermark boosting is requested. Link: https://lkml.kernel.org/r/20201020175833.397286-1-hannes@cmpxchg.org Fixes: `1c30844d2d` ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:55 +01:00
Dan Carpenter	9b52a37fb3	hugetlb: fix an error code in hugetlb_reserve_pages() [ Upstream commit 7fc2513aa237e2ce239ab54d7b04d1d79b317110 ] Preserve the error code from region_add() instead of returning success. Link: https://lkml.kernel.org/r/X9NGZWnZl5/Mt99R@mwanda Fixes: `0db9d74ed8` ("hugetlb: disable region_add file_region coalescing") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mina Almasry <almasrymina@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:55 +01:00
Oscar Salvador	b7bf8ed8d1	mm,memory_failure: always pin the page in madvise_inject_error [ Upstream commit 1e8aaedb182d6ddffc894b832e4962629907b3e0 ] madvise_inject_error() uses get_user_pages_fast to translate the address we specified to a page. After [1], we drop the extra reference count for memory_failure() path. That commit says that memory_failure wanted to keep the pin in order to take the page out of circulation. The truth is that we need to keep the page pinned, otherwise the page might be re-used after the put_page() and we can end up messing with someone else's memory. E.g: CPU0 process X CPU1 madvise_inject_error get_user_pages put_page page gets reclaimed process Y allocates the page memory_failure // We mess with process Y memory madvise() is meant to operate on a self address space, so messing with pages that do not belong to us seems the wrong thing to do. To avoid that, let us keep the page pinned for memory_failure as well. Pages for DAX mappings will release this extra refcount in memory_failure_dev_pagemap. [1] ("23e7b5c2e271: mm, madvise_inject_error: Let memory_failure() optionally take a page reference") Link: https://lkml.kernel.org/r/20201207094818.8518-1-osalvador@suse.de Fixes: `23e7b5c2e2` ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference") Signed-off-by: Oscar Salvador <osalvador@suse.de> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:55 +01:00
Vincenzo Frascino	23713b480d	mm/vmalloc.c: fix kasan shadow poisoning size [ Upstream commit c041098c690fe53cea5d20c62f128a4f7a5c19fe ] The size of vm area can be affected by the presence or not of the guard page. In particular when VM_NO_GUARD is present, the actual accessible size has to be considered like the real size minus the guard page. Currently kasan does not keep into account this information during the poison operation and in particular tries to poison the guard page as well. This approach, even if incorrect, does not cause an issue because the tags for the guard page are written in the shadow memory. With the future introduction of the Tag-Based KASAN, being the guard page inaccessible by nature, the write tag operation on this page triggers a fault. Fix kasan shadow poisoning size invoking get_vm_area_size() instead of accessing directly the field in the data structure to detect the correct value. Link: https://lkml.kernel.org/r/20201027160213.32904-1-vincenzo.frascino@arm.com Fixes: `d98c9e83b5` ("kasan: fix crashes on access to memory mapped by vm_map_ram()") Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:55 +01:00
Waiman Long	4a9d8b0789	mm/vmalloc: Fix unlock order in s_stop() [ Upstream commit 0a7dd4e901b8a4ee040ba953900d1d7120b34ee5 ] When multiple locks are acquired, they should be released in reverse order. For s_start() and s_stop() in mm/vmalloc.c, that is not the case. s_start: mutex_lock(&vmap_purge_lock); spin_lock(&vmap_area_lock); s_stop : mutex_unlock(&vmap_purge_lock); spin_unlock(&vmap_area_lock); This unlock sequence, though allowed, is not optimal. If a waiter is present, mutex_unlock() will need to go through the slowpath of waking up the waiter with preemption disabled. Fix that by releasing the spinlock first before the mutex. Link: https://lkml.kernel.org/r/20201213180843.16938-1-longman@redhat.com Fixes: `e36176be1c` ("mm/vmalloc: rework vmap_area_lock") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:55 +01:00
Matthew Wilcox (Oracle)	bbb7c059fd	sparc: fix handling of page table constructor failure [ Upstream commit 06517c9a336f4c20f2064611bf4b1e7881a95fe1 ] The page has just been allocated, so its refcount is 1. free_unref_page() is for use on pages which have a zero refcount. Use __free_page() like the other implementations of pte_alloc_one(). Link: https://lkml.kernel.org/r/20201125034655.27687-1-willy@infradead.org Fixes: `1ae9ae5f7d` ("sparc: handle pgtable_page_ctor() fail") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:55 +01:00
Shakeel Butt	dd156e3fca	mm/rmap: always do TTU_IGNORE_ACCESS [ Upstream commit 013339df116c2ee0d796dd8bfb8f293a2030c063 ] Since commit `369ea8242c` ("mm/rmap: update to new mmu_notifier semantic v2"), the code to check the secondary MMU's page table access bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from the secondary MMU's page table before the check. More specifically for those secondary MMUs which unmap the memory in mmu_notifier_invalidate_range_start() like kvm. However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the absence of TTU_IGNORE_ACCESS and it explicitly performs the page table access check before trying to unmap the page. So, at worst the reclaim will miss accesses in a very short window if we remove page table access check in unmapping code. There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg reclaim. From memcg reclaim the page_referenced() only account the accesses from the processes which are in the same memcg of the target page but the unmapping code is considering accesses from all the processes, so, decreasing the effectiveness of memcg reclaim. The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping code. Link: https://lkml.kernel.org/r/20201104231928.1494083-1-shakeelb@google.com Fixes: `369ea8242c` ("mm/rmap: update to new mmu_notifier semantic v2") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:55 +01:00
Muchun Song	6d48fff6d3	mm: memcg/slab: fix use after free in obj_cgroup_charge [ Upstream commit eefbfa7fd678805b38a46293e78543f98f353d3e ] The rcu_read_lock/unlock only can guarantee that the memcg will not be freed, but it cannot guarantee the success of css_get to memcg. If the whole process of a cgroup offlining is completed between reading a objcg->memcg pointer and bumping the css reference on another CPU, and there are exactly 0 external references to this memory cgroup (how we get to the obj_cgroup_charge() then?), css_get() can change the ref counter from 0 back to 1. Link: https://lkml.kernel.org/r/20201028035013.99711-2-songmuchun@bytedance.com Fixes: `bf4f059954` ("mm: memcg/slab: obj_cgroup API") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Chris Down <chris@chrisdown.name> Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:54 +01:00
Muchun Song	02314d05e8	mm: memcg/slab: fix return of child memcg objcg for root memcg [ Upstream commit 2f7659a314736b32b66273dbf91c19874a052fde ] Consider the following memcg hierarchy. root / \ A B If we failed to get the reference on objcg of memcg A, the get_obj_cgroup_from_current can return the wrong objcg for the root memcg. Link: https://lkml.kernel.org/r/20201029164429.58703-1-songmuchun@bytedance.com Fixes: `bf4f059954` ("mm: memcg/slab: obj_cgroup API") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Chris Down <chris@chrisdown.name> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Adrian Reber <areber@redhat.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:54 +01:00
Jason Gunthorpe	cfde6c1810	mm/gup: combine put_compound_head() and unpin_user_page() [ Upstream commit 4509b42c38963f495b49aa50209c34337286ecbe ] These functions accomplish the same thing but have different implementations. unpin_user_page() has a bug where it calls mod_node_page_state() after calling put_page() which creates a risk that the page could have been hot-uplugged from the system. Fix this by using put_compound_head() as the only implementation. __unpin_devmap_managed_user_page() and related can be deleted as well in favour of the simpler, but slower, version in put_compound_head() that has an extra atomic page_ref_sub, but always calls put_page() which internally contains the special devmap code. Move put_compound_head() to be directly after try_grab_compound_head() so people can find it in future. Link: https://lkml.kernel.org/r/0-v1-6730d4ee0d32+40e6-gup_combine_put_jgg@nvidia.com Fixes: `1970dc6f52` ("mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> CC: Joao Martins <joao.m.martins@oracle.com> CC: Jonathan Corbet <corbet@lwn.net> CC: Dan Williams <dan.j.williams@intel.com> CC: Dave Chinner <david@fromorbit.com> CC: Christoph Hellwig <hch@infradead.org> CC: Jane Chu <jane.chu@oracle.com> CC: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> CC: Michal Hocko <mhocko@suse.com> CC: Mike Kravetz <mike.kravetz@oracle.com> CC: Shuah Khan <shuah@kernel.org> CC: Muchun Song <songmuchun@bytedance.com> CC: Vlastimil Babka <vbabka@suse.cz> CC: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:54 +01:00
Jason Gunthorpe	537946556c	mm/gup: prevent gup_fast from racing with COW during fork [ Upstream commit 57efa1fe5957694fa541c9062de0a127f0b9acb0 ] Since commit `70e806e4e6` ("mm: Do early cow for pinned pages during fork() for ptes") pages under a FOLL_PIN will not be write protected during COW for fork. This means that pages returned from pin_user_pages(FOLL_WRITE) should not become write protected while the pin is active. However, there is a small race where get_user_pages_fast(FOLL_PIN) can establish a FOLL_PIN at the same time copy_present_page() is write protecting it: CPU 0 CPU 1 get_user_pages_fast() internal_get_user_pages_fast() copy_page_range() pte_alloc_map_lock() copy_present_page() atomic_read(has_pinned) == 0 page_maybe_dma_pinned() == false atomic_set(has_pinned, 1); gup_pgd_range() gup_pte_range() pte_t pte = gup_get_pte(ptep) pte_access_permitted(pte) try_grab_compound_head() pte = pte_wrprotect(pte) set_pte_at(); pte_unmap_unlock() // GUP now returns with a write protected page The first attempt to resolve this by using the write protect caused problems (and was missing a barrrier), see commit `f3c64eda3e` ("mm: avoid early COW write protect games during fork()") Instead wrap copy_p4d_range() with the write side of a seqcount and check the read side around gup_pgd_range(). If there is a collision then get_user_pages_fast() fails and falls back to slow GUP. Slow GUP is safe against this race because copy_page_range() is only called while holding the exclusive side of the mmap_lock on the src mm_struct. [akpm@linux-foundation.org: coding style fixes] Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@mail.gmail.com Link: https://lkml.kernel.org/r/2-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com Fixes: `f3c64eda3e` ("mm: avoid early COW write protect games during fork()") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: "Ahmed S. Darwish" <a.darwish@linutronix.de> [seqcount_t parts] Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:54 +01:00
Jason Gunthorpe	bcb0f647c1	mm/gup: reorganize internal_get_user_pages_fast() [ Upstream commit c28b1fc70390df32e29991eedd52bd86e7aba080 ] Patch series "Add a seqcount between gup_fast and copy_page_range()", v4. As discussed and suggested by Linus use a seqcount to close the small race between gup_fast and copy_page_range(). Ahmed confirms that raw_write_seqcount_begin() is the correct API to use in this case and it doesn't trigger any lockdeps. I was able to test it using two threads, one forking and the other using ibv_reg_mr() to trigger GUP fast. Modifying copy_page_range() to sleep made the window large enough to reliably hit to test the logic. This patch (of 2): The next patch in this series makes the lockless flow a little more complex, so move the entire block into a new function and remove a level of indention. Tidy a bit of cruft: - addr is always the same as start, so use start - Use the modern check_add_overflow() for computing end = start + len - nr_pinned/pages << PAGE_SHIFT needs the LHS to be unsigned long to avoid shift overflow, make the variables unsigned long to avoid coding casts in both places. nr_pinned was missing its cast - The handling of ret and nr_pinned can be streamlined a bit No functional change. Link: https://lkml.kernel.org/r/0-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com Link: https://lkml.kernel.org/r/1-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:54 +01:00
Alex Deucher	c51e3679eb	drm/amdgpu: fix regression in vbios reservation handling on headless [ Upstream commit 7eded018bfeccb365963bb51be731a9f99aeea59 ] We need to move the check under the non-headless case, otherwise we always reserve the VGA save size. Fixes: `157fe68d74` ("drm/amdgpu: fix size calculation with stolen vga memory") Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:53:54 +01:00

1 2 3 4 5 ...

969391 Commits