linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-24 04:00:52 +07:00

Author	SHA1	Message	Date
Pavel Skripkin	8006426ee3	media: drivers/media/usb: fix memory leak in zr364xx_probe [ Upstream commit 9c39be40c0155c43343f53e3a439290c0fec5542 ] syzbot reported memory leak in zr364xx_probe()[1]. The problem was in invalid error handling order. All error conditions rigth after v4l2_ctrl_handler_init() must call v4l2_ctrl_handler_free(). Reported-by: syzbot+efe9aefc31ae1e6f7675@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:55:30 +02:00
Dan Carpenter	472cd50dfc	media: zr364xx: fix memory leaks in probe() [ Upstream commit ea354b6ddd6f09be29424f41fa75a3e637fea234 ] Syzbot discovered that the probe error handling doesn't clean up the resources allocated in zr364xx_board_init(). There are several related bugs in this code so I have re-written the error handling. 1) Introduce a new function zr364xx_board_uninit() which cleans up the resources in zr364xx_board_init(). 2) In zr364xx_board_init() if the call to zr364xx_start_readpipe() fails then release the "cam->buffer.frame[i].lpvbits" memory before returning. This way every function either allocates everything successfully or it cleans up after itself. 3) Re-write the probe function so that each failure path goto frees the most recent allocation. That way we don't free anything before it has been allocated and we can also verify that everything is freed. 4) Originally, in the probe function the "cam->v4l2_dev.release" pointer was set to "zr364xx_release" near the start but I moved that assignment to the end, after everything had succeeded. The release function was never actually called during the probe cleanup process, but with this change I wanted to make it clear that we don't want to call zr364xx_release() until everything is allocated successfully. Next I re-wrote the zr364xx_release() function. Ideally this would have been a simple matter of copy and pasting the cleanup code from probe and adding an additional call to video_unregister_device(). But there are a couple quirks to note. 1) The probe function does not call videobuf_mmap_free() and I don't know where the videobuf_mmap is allocated. I left the code as-is to avoid introducing a bug in code I don't understand. 2) The zr364xx_board_uninit() has a call to zr364xx_stop_readpipe() which is a change from the original behavior with regards to unloading the driver. Calling zr364xx_stop_readpipe() on a stopped pipe is not a problem so this is safe and is potentially a bugfix. Reported-by: syzbot+b4d54814b339b5c6bbd4@syzkaller.appspotmail.com Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:55:30 +02:00
Evgeny Novikov	47d6f6be88	media: zr364xx: propagate errors from zr364xx_start_readpipe() [ Upstream commit af0321a5be3e5647441eb6b79355beaa592df97a ] zr364xx_start_readpipe() can fail but callers do not care about that. This can result in various negative consequences. The patch adds missed error handling. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Evgeny Novikov <novikov@ispras.ru> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:55:30 +02:00
Andreas Persson	f6e3858730	mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards commit 2394e628738933aa014093d93093030f6232946d upstream. Erasing an AMD linear flash card (AM29F016D) crashes after the first sector has been erased. Likewise, writing to it crashes after two bytes have been written. The reason is a missing check for a null pointer - the cmdset_priv field is not set for this type of card. Fixes: `4844ef8030` ("mtd: cfi_cmdset_0002: Add support for polling status register") Signed-off-by: Andreas Persson <andreasp56@outlook.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/DB6P189MB05830B3530B8087476C5CFE4C1159@DB6P189MB0583.EURP189.PROD.OUTLOOK.COM Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:29 +02:00
Jouni Malinen	aee4d62c54	ath9k: Postpone key cache entry deletion for TXQ frames reference it commit ca2848022c12789685d3fab3227df02b863f9696 upstream. Do not delete a key cache entry that is still being referenced by pending frames in TXQs. This avoids reuse of the key cache entry while a frame might still be transmitted using it. To avoid having to do any additional operations during the main TX path operations, track pending key cache entries in a new bitmap and check whether any pending entries can be deleted before every new key add/remove operation. Also clear any remaining entries when stopping the interface. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201214172118.18100-6-jouni@codeaurora.org Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:29 +02:00
Jouni Malinen	4a22c21811	ath: Modify ath_key_delete() to not need full key entry commit 144cd24dbc36650a51f7fe3bf1424a1432f1f480 upstream. tkip_keymap can be used internally to avoid the reference to key->cipher and with this, only the key index value itself is needed. This allows ath_key_delete() call to be postponed to be handled after the upper layer STA and key entry have already been removed. This is needed to make ath9k key cache management safer. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201214172118.18100-5-jouni@codeaurora.org Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:29 +02:00
Jouni Malinen	956dd28033	ath: Export ath_hw_keysetmac() commit d2d3e36498dd8e0c83ea99861fac5cf9e8671226 upstream. ath9k is going to use this for safer management of key cache entries. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201214172118.18100-4-jouni@codeaurora.org Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:29 +02:00
Jouni Malinen	f1d7864936	ath9k: Clear key cache explicitly on disabling hardware commit 73488cb2fa3bb1ef9f6cf0d757f76958bd4deaca upstream. Now that ath/key.c may not be explicitly clearing keys from the key cache, clear all key cache entries when disabling hardware to make sure no keys are left behind beyond this point. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201214172118.18100-3-jouni@codeaurora.org Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:29 +02:00
Jouni Malinen	0a246f9f75	ath: Use safer key clearing with key cache entries commit 56c5485c9e444c2e85e11694b6c44f1338fc20fd upstream. It is possible for there to be pending frames in TXQs with a reference to the key cache entry that is being deleted. If such a key cache entry is cleared, those pending frame in TXQ might get transmitted without proper encryption. It is safer to leave the previously used key into the key cache in such cases. Instead, only clear the MAC address to prevent RX processing from using this key cache entry. This is needed in particularly in AP mode where the TXQs cannot be flushed on station disconnection. This change alone may not be able to address all cases where the key cache entry might get reused for other purposes immediately (the key cache entry should be released for reuse only once the TXQs do not have any remaining references to them), but this makes it less likely to get unprotected frames and the more complete changes may end up being significantly more complex. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201214172118.18100-2-jouni@codeaurora.org Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:29 +02:00
Ben Hutchings	f31d6232cc	net: dsa: microchip: ksz8795: Use software untagging on CPU port commit 9130c2d30c17846287b803a9803106318cbe5266 upstream. On the CPU port, we can support both tagged and untagged VLANs at the same time by doing any necessary untagging in software rather than hardware. To enable that, keep the CPU port's Remove Tag flag cleared and set the dsa_switch::untag_bridge_pvid flag. Fixes: `e66f840c08` ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backport to 5.10: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:17 +02:00
Ben Hutchings	03e2a695e8	net: dsa: microchip: ksz8795: Fix VLAN untagged flag change on deletion commit af01754f9e3c553a2ee63b4693c79a3956e230ab upstream. When a VLAN is deleted from a port, the flags in struct switchdev_obj_port_vlan are always 0. ksz8_port_vlan_del() copies the BRIDGE_VLAN_INFO_UNTAGGED flag to the port's Tag Removal flag, and therefore always clears it. In case there are multiple VLANs configured as untagged on this port - which seems useless, but is allowed - deleting one of them changes the remaining VLANs to be tagged. It's only ever necessary to change this flag when a VLAN is added to the port, so leave it unchanged in ksz8_port_vlan_del(). Fixes: `e66f840c08` ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backport to 5.10: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:16 +02:00
Ben Hutchings	2eada62c2f	net: dsa: microchip: ksz8795: Reject unsupported VLAN configuration commit 8f4f58f88fe0d9bd591f21f53de7dbd42baeb3fa upstream. The switches supported by ksz8795 only have a per-port flag for Tag Removal. This means it is not possible to support both tagged and untagged VLANs on the same port. Reject attempts to add a VLAN that requires the flag to be changed, unless there are no VLANs currently configured. VID 0 is excluded from this check since it is untagged regardless of the state of the flag. On the CPU port we could support tagged and untagged VLANs at the same time. This will be enabled by a later patch. Fixes: `e66f840c08` ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backport to 5.10: - This configuration has to be detected and rejected in the port_vlan_prepare operation - ksz8795_port_vlan_add() has to check again to decide whether to change the Tag Removal flag, so put the common condition in a separate function - Handle VID ranges] Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:16 +02:00
Ben Hutchings	e7b4c8dccc	net: dsa: microchip: ksz8795: Fix PVID tag insertion commit ef3b02a1d79b691f9a354c4903cf1e6917e315f9 upstream. ksz8795 has never actually enabled PVID tag insertion, and it also programmed the PVID incorrectly. To fix this: * Allow tag insertion to be controlled per ingress port. On most chips, set bit 2 in Global Control 19. On KSZ88x3 this control flag doesn't exist. * When adding a PVID: - Set the appropriate register bits to enable tag insertion on egress at every other port if this was the packet's ingress port. - Mask out the VID from the default tag, before or-ing in the new PVID. * When removing a PVID: - Clear the same control bits to disable tag insertion. - Don't update the default tag. This wasn't doing anything useful. Fixes: `e66f840c08` ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backport to 5.10: - Drop the KSZ88x3 cases as those chips are not supported here - Handle VID ranges in ksz8795_port_vlan_del()] Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:16 +02:00
Ben Hutchings	f805d58428	net: dsa: microchip: Fix probing KSZ87xx switch with DT node for host port The ksz8795 and ksz9477 drivers differ in the way they count ports. For ksz8795, ksz_device::port_cnt does not include the host port whereas for ksz9477 it does. This inconsistency was fixed in Linux 5.11 by a series of changes, but remains in 5.10-stable. When probing, the common code treats a port device node with an address >= dev->port_cnt as a fatal error. As a minimal fix, change it to compare again dev->mib_port_cnt. This is the length of the dev->ports array that the port number will be used to index, and always includes the host port. Cc: Woojung Huh <woojung.huh@microchip.com> Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com> Cc: Michael Grzeschik <m.grzeschik@pengutronix.de> Cc: Marek Vasut <marex@denx.de> Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:16 +02:00
Maxim Levitsky	015b27704c	KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656) commit c7dfa4009965a9b2d7b329ee970eb8da0d32f0bc upstream. If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor), then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only possible by making L0 intercept these instructions. Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted, and thus read/write portions of the host physical memory. Fixes: `89c8a4984f` ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:16 +02:00
Maxim Levitsky	f757530535	KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653) commit 0f923e07124df069ba68d8bb12324398f4b6b709 upstream. * Invert the mask of bits that we pick from L2 in nested_vmcb02_prepare_control * Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr This fixes a security issue that allowed a malicious L1 to run L2 with AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled AVIC to read/write the host physical memory at some offsets. Fixes: `3d6368ef58` ("KVM: SVM: Add VMRUN handler") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:16 +02:00
Nathan Chancellor	03c2eb669c	vmlinux.lds.h: Handle clang's module.{c,d}tor sections commit 848378812e40152abe9b9baf58ce2004f76fb988 upstream. A recent change in LLVM causes module_{c,d}tor sections to appear when CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings because these are not handled anywhere: ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' Fangrui explains: "the function asan.module_ctor has the SHF_GNU_RETAIN flag, so it is in a separate section even with -fno-function-sections (default)". Place them in the TEXT_TEXT section so that these technologies continue to work with the newer compiler versions. All of the KASAN and KCSAN KUnit tests continue to pass after this change. Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1432 Link: `7b78956224` Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Fangrui Song <maskray@google.com> Acked-by: Marco Elver <elver@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210731023107.1932981-1-nathan@kernel.org [nc: Resolve conflict due to lack of cf68fffb66d60] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:55:16 +02:00
Hans de Goede	163fa2c887	vboxsf: Add support for the atomic_open directory-inode op commit 52dfd86aa568e433b24357bb5fc725560f1e22d8 upstream. Opening a new file is done in 2 steps on regular filesystems: 1. Call the create inode-op on the parent-dir to create an inode to hold the meta-data related to the file. 2. Call the open file-op to get a handle for the file. vboxsf however does not really use disk-backed inodes because it is based on passing through file-related system-calls through to the hypervisor. So both steps translate to an open(2) call being passed through to the hypervisor. With the handle returned by the first call immediately being closed again. Making 2 open calls for a single open(..., O_CREATE, ...) calls has 2 problems: a) It is not really efficient. b) It actually breaks some apps. An example of b) is doing a git clone inside a vboxsf mount. When git clone tries to create a tempfile to store the pak files which is downloading the following happens: 1. vboxsf_dir_mkfile() gets called with a mode of 0444 and succeeds. 2. vboxsf_file_open() gets called with file->f_flags containing O_RDWR. When the host is a Linux machine this fails because doing a open(..., O_RDWR) on a file which exists and has mode 0444 results in an -EPERM error. Other network-filesystems and fuse avoid the problem of needing to pass 2 open() calls to the other side by using the atomic_open directory-inode op. This commit fixes git clone not working inside a vboxsf mount, by adding support for the atomic_open directory-inode op. As an added bonus this should also make opening new files faster. The atomic_open implementation is modelled after the atomic_open implementations from the 9p and fuse code. Fixes: `0fd1695766` ("fs: Add VirtualBox guest shared folder (vboxsf) support") Reported-by: Ludovic Pouzenc <bugreports@pouzenc.fr> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:41 +02:00
Hans de Goede	65f1eea8a3	vboxsf: Add vboxsf_[create\|release]_sf_handle() helpers commit 02f840f90764f22f5c898901849bdbf0cee752ba upstream. Factor out the code to create / release a struct vboxsf_handle into 2 new helper functions. This is a preparation patch for adding atomic_open support. Fixes: `0fd1695766` ("fs: Add VirtualBox guest shared folder (vboxsf) support") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:41 +02:00
Sean Christopherson	d37b35729b	KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF commit 18712c13709d2de9516c5d3414f707c4f0a9c190 upstream. Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF in L2 or if the VM-Exit should be forwarded to L1. The current logic fails to account for the case where #PF is intercepted to handle guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into L1. At best, L1 will complain and inject the #PF back into L2. At worst, L1 will eat the unexpected fault and cause L2 to hang on infinite page faults. Note, while the bug was technically introduced by the commit that added support for the MAXPHYADDR madness, the shame is all on commit `a0c134347b` ("KVM: VMX: introduce vmx_need_pf_intercept"). Fixes: `1dbf5d68af` ("KVM: VMX: Add guest physical address check in EPT violation and misconfig") Cc: stable@vger.kernel.org Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812045615.3167686-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:41 +02:00
Sean Christopherson	1a9732ef1a	KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation commit 7b9cae027ba3aaac295ae23a62f47876ed97da73 upstream. Use the secondary_exec_controls_get() accessor in vmx_has_waitpkg() to effectively get the controls for the current VMCS, as opposed to using vmx->secondary_exec_controls, which is the cached value of KVM's desired controls for vmcs01 and truly not reflective of any particular VMCS. While the waitpkg control is not dynamic, i.e. vmcs01 will always hold the same waitpkg configuration as vmx->secondary_exec_controls, the same does not hold true for vmcs02 if the L1 VMM hides the feature from L2. If L1 hides the feature _and_ does not intercept MSR_IA32_UMWAIT_CONTROL, L2 could incorrectly read/write L1's virtual MSR instead of taking a #GP. Fixes: `6e3ba4abce` ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:41 +02:00
Ard Biesheuvel	96617eadb7	efi/libstub: arm64: Double check image alignment at entry commit c32ac11da3f83bb42b986702a9b92f0a14ed4182 upstream. On arm64, the stub only moves the kernel image around in memory if needed, which is typically only for KASLR, given that relocatable kernels (which is the default) can run from any 64k aligned address, which is also the minimum alignment communicated to EFI via the PE/COFF header. Unfortunately, some loaders appear to ignore this header, and load the kernel at some arbitrary offset in memory. We can deal with this, but let's check for this condition anyway, so non-compliant code can be spotted and fixed. Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:41 +02:00
Christophe Leroy	c85e84d206	powerpc/smp: Fix OOPS in topology_init() commit 8241461536f21bbe51308a6916d1c9fb2e6b75a7 upstream. Running an SMP kernel on an UP platform not prepared for it, I encountered the following OOPS: BUG: Kernel NULL pointer dereference on read at 0x00000034 Faulting instruction address: 0xc0a04110 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K SMP NR_CPUS=2 CMPCPRO Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-pmac-00001-g230fedfaad21 #5234 NIP: c0a04110 LR: c0a040d8 CTR: c0a04084 REGS: e100dda0 TRAP: 0300 Not tainted (5.13.0-pmac-00001-g230fedfaad21) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 84000284 XER: 00000000 DAR: 00000034 DSISR: 20000000 GPR00: c0006bd4 e100de60 c1033320 00000000 00000000 c0942274 00000000 00000000 GPR08: 00000000 00000000 00000001 00000063 00000007 00000000 c0006f30 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000005 GPR24: c0c67d74 c0c67f1c c0c60000 c0c67d70 c0c0c558 1efdf000 c0c00020 00000000 NIP [c0a04110] topology_init+0x8c/0x138 LR [c0a040d8] topology_init+0x54/0x138 Call Trace: [e100de60] [80808080] 0x80808080 (unreliable) [e100de90] [c0006bd4] do_one_initcall+0x48/0x1bc [e100def0] [c0a0150c] kernel_init_freeable+0x1c8/0x278 [e100df20] [c0006f44] kernel_init+0x14/0x10c [e100df30] [c00190fc] ret_from_kernel_thread+0x14/0x1c Instruction dump: 7c692e70 7d290194 7c035040 7c7f1b78 5529103a 546706fe 5468103a 39400001 7c641b78 40800054 80c690b4 7fb9402e <81060034> 7fbeea14 2c080000 7fa3eb78 ---[ end trace b246ffbc6bbbb6fb ]--- Fix it by checking smp_ops before using it, as already done in several other places in the arch/powerpc/kernel/smp.c Fixes: `39f8756145` ("powerpc/smp: Move ppc_md.cpu_die() to smp_ops.cpu_offline_self()") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/75287841cbb8740edd44880fe60be66d489160d9.1628097995.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:41 +02:00
Thomas Gleixner	6706140e2c	PCI/MSI: Protect msi_desc::masked for multi-MSI commit 77e89afc25f30abd56e76a809ee2884d7c1b63ce upstream. Multi-MSI uses a single MSI descriptor and there is a single mask register when the device supports per vector masking. To avoid reading back the mask register the value is cached in the MSI descriptor and updates are done by clearing and setting bits in the cache and writing it to the device. But nothing protects msi_desc::masked and the mask register from being modified concurrently on two different CPUs for two different Linux interrupts which belong to the same multi-MSI descriptor. Add a lock to struct device and protect any operation on the mask and the mask register with it. This makes the update of msi_desc::masked unconditional, but there is no place which requires a modification of the hardware register without updating the masked cache. msi_mask_irq() is now an empty wrapper which will be cleaned up in follow up changes. The problem goes way back to the initial support of multi-MSI, but picking the commit which introduced the mask cache is a valid cut off point (2.6.30). Fixes: `f2440d9acb` ("PCI MSI: Refactor interrupt masking code") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:41 +02:00
Thomas Gleixner	76e5acf512	PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() commit d28d4ad2a1aef27458b3383725bb179beb8d015c upstream. No point in using the raw write function from shutdown. Preparatory change to introduce proper serialization for the msi_desc::masked cache. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.674391354@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Thomas Gleixner	310307afc1	PCI/MSI: Correct misleading comments commit 689e6b5351573c38ccf92a0dd8b3e2c2241e4aff upstream. The comments about preserving the cached state in pci_msi[x]_shutdown() are misleading as the MSI descriptors are freed right after those functions return. So there is nothing to restore. Preparatory change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.621609423@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Thomas Gleixner	51836e7353	PCI/MSI: Do not set invalid bits in MSI mask commit 361fd37397f77578735907341579397d5bed0a2d upstream. msi_mask_irq() takes a mask and a flags argument. The mask argument is used to mask out bits from the cached mask and the flags argument to set bits. Some places invoke it with a flags argument which sets bits which are not used by the device, i.e. when the device supports up to 8 vectors a full unmask in some places sets the mask to 0xFFFFFF00. While devices probably do not care, it's still bad practice. Fixes: `7ba1930db0` ("PCI MSI: Unmask MSI if setup failed") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.568173099@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Thomas Gleixner	5e868099f0	PCI/MSI: Enforce MSI[X] entry updates to be visible commit b9255a7cb51754e8d2645b65dd31805e282b4f3e upstream. Nothing enforces the posted writes to be visible when the function returns. Flush them even if the flush might be redundant when the entry is masked already as the unmask will flush as well. This is either setup or a rare affinity change event so the extra flush is not the end of the world. While this is more a theoretical issue especially the logic in the X86 specific msi_set_affinity() function relies on the assumption that the update has reached the hardware when the function returns. Again, as this never has been enforced the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.515188147@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Thomas Gleixner	619c6e5d0a	PCI/MSI: Enforce that MSI-X table entry is masked for update commit da181dc974ad667579baece33c2c8d2d1e4558d5 upstream. The specification (PCIe r5.0, sec 6.1.4.5) states: For MSI-X, a function is permitted to cache Address and Data values from unmasked MSI-X Table entries. However, anytime software unmasks a currently masked MSI-X Table entry either by clearing its Mask bit or by clearing the Function Mask bit, the function must update any Address or Data values that it cached from that entry. If software changes the Address or Data value of an entry while the entry is unmasked, the result is undefined. The Linux kernel's MSI-X support never enforced that the entry is masked before the entry is modified hence the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Enforce the entry to be masked across the update. There is no point in enforcing this to be handled at all possible call sites as this is just pointless code duplication and the common update function is the obvious place to enforce this. Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Reported-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.462096385@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Thomas Gleixner	6a8bddae2a	PCI/MSI: Mask all unused MSI-X entries commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f upstream. When MSI-X is enabled the ordering of calls is: msix_map_region(); msix_setup_entries(); pci_msi_setup_msi_irqs(); msix_program_entries(); This has a few interesting issues: 1) msix_setup_entries() allocates the MSI descriptors and initializes them except for the msi_desc:masked member which is left zero initialized. 2) pci_msi_setup_msi_irqs() allocates the interrupt descriptors and sets up the MSI interrupts which ends up in pci_write_msi_msg() unless the interrupt chip provides its own irq_write_msi_msg() function. 3) msix_program_entries() does not do what the name suggests. It solely updates the entries array (if not NULL) and initializes the masked member for each MSI descriptor by reading the hardware state and then masks the entry. Obviously this has some issues: 1) The uninitialized masked member of msi_desc prevents the enforcement of masking the entry in pci_write_msi_msg() depending on the cached masked bit. Aside of that half initialized data is a NONO in general 2) msix_program_entries() only ensures that the actually allocated entries are masked. This is wrong as experimentation with crash testing and crash kernel kexec has shown. This limited testing unearthed that when the production kernel had more entries in use and unmasked when it crashed and the crash kernel allocated a smaller amount of entries, then a full scan of all entries found unmasked entries which were in use in the production kernel. This is obviously a device or emulation issue as the device reset should mask all MSI-X table entries, but obviously that's just part of the paper specification. Cure this by: 1) Masking all table entries in hardware 2) Initializing msi_desc::masked in msix_setup_entries() 3) Removing the mask dance in msix_program_entries() 4) Renaming msix_program_entries() to msix_update_entries() to reflect the purpose of that function. As the masking of unused entries has never been done the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.403833459@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Thomas Gleixner	5653b5bb00	PCI/MSI: Enable and mask MSI-X early commit 438553958ba19296663c6d6583d208dfb6792830 upstream. The ordering of MSI-X enable in hardware is dysfunctional: 1) MSI-X is disabled in the control register 2) Various setup functions 3) pci_msi_setup_msi_irqs() is invoked which ends up accessing the MSI-X table entries 4) MSI-X is enabled and masked in the control register with the comment that enabling is required for some hardware to access the MSI-X table Step #4 obviously contradicts #3. The history of this is an issue with the NIU hardware. When #4 was introduced the table access actually happened in msix_program_entries() which was invoked after enabling and masking MSI-X. This was changed in commit `d71d6432e1` ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts") which removed the table write from msix_program_entries(). Interestingly enough nobody noticed and either NIU still works or it did not get any testing with a kernel 3.19 or later. Nevertheless this is inconsistent and there is no reason why MSI-X can't be enabled and masked in the control register early on, i.e. move step #4 above to step #1. This preserves the NIU workaround and has no side effects on other hardware. Fixes: `d71d6432e1` ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.344136412@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Ben Dai	11635f6e64	genirq/timings: Prevent potential array overflow in __irq_timings_store() commit b9cc7d8a4656a6e815852c27ab50365009cb69c1 upstream. When the interrupt interval is greater than 2 ^ PREDICTION_BUFFER_SIZE * PREDICTION_FACTOR us and less than 1s, the calculated index will be greater than the length of irqs->ema_time[]. Check the calculated index before using it to prevent array overflow. Fixes: `23aa3b9a6b` ("genirq/timings: Encapsulate storing function") Signed-off-by: Ben Dai <ben.dai@unisoc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210425150903.25456-1-ben.dai9703@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Bixuan Cui	22e0660e40	genirq/msi: Ensure deactivation on teardown commit dbbc93576e03fbe24b365fab0e901eb442237a8a upstream. msi_domain_alloc_irqs() invokes irq_domain_activate_irq(), but msi_domain_free_irqs() does not enforce deactivation before tearing down the interrupts. This happens when PCI/MSI interrupts are set up and never used before being torn down again, e.g. in error handling pathes. The only place which cleans that up is the error handling path in msi_domain_alloc_irqs(). Move the cleanup from msi_domain_alloc_irqs() into msi_domain_free_irqs() to cure that. Fixes: `f3b0946d62` ("genirq/msi: Make sure PCI MSIs are activated early") Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210518033117.78104-1-cuibixuan@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Babu Moger	247681e8ac	x86/resctrl: Fix default monitoring groups reporting commit 064855a69003c24bd6b473b367d364e418c57625 upstream. Creating a new sub monitoring group in the root /sys/fs/resctrl leads to getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on the entire filesystem. Steps to reproduce: 1. mount -t resctrl resctrl /sys/fs/resctrl/ 2. cd /sys/fs/resctrl/ 3. cat mon_data/mon_L3_00/mbm_total_bytes 23189832 4. Create sub monitor group: mkdir mon_groups/test1 5. cat mon_data/mon_L3_00/mbm_total_bytes Unavailable When a new monitoring group is created, a new RMID is assigned to the new group. But the RMID is not active yet. When the events are read on the new RMID, it is expected to report the status as "Unavailable". When the user reads the events on the default monitoring group with multiple subgroups, the events on all subgroups are consolidated together. Currently, if any of the RMID reads report as "Unavailable", then everything will be reported as "Unavailable". Fix the issue by discarding the "Unavailable" reads and reporting all the successful RMID reads. This is not a problem on Intel systems as Intel reports 0 on Inactive RMIDs. Fixes: `d89b737901` ("x86/intel_rdt/cqm: Add mon_data") Reported-by: Paweł Szulik <pawel.szulik@intel.com> Signed-off-by: Babu Moger <Babu.Moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311 Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Thomas Gleixner	38fa3f21d8	x86/ioapic: Force affinity setup before startup commit 0c0e37dc11671384e53ba6ede53a4d91162a2cc5 upstream. The IO/APIC cannot handle interrupt affinity changes safely after startup other than from an interrupt handler. The startup sequence in the generic interrupt code violates that assumption. Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time. Fixes: `1840475676` ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.832143400@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Thomas Gleixner	3cd8aa1ec7	x86/msi: Force affinity setup before startup commit ff363f480e5997051dd1de949121ffda3b753741 upstream. The X86 MSI mechanism cannot handle interrupt affinity changes safely after startup other than from an interrupt handler, unless interrupt remapping is enabled. The startup sequence in the generic interrupt code violates that assumption. Mark the irq chips with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time. While the interrupt remapping MSI chip does not require this, there is no point in treating it differently as this might spare an interrupt to a CPU which is not in the default affinity mask. For the non-remapping case go to the direct write path when the interrupt is not yet started similar to the not yet activated case. Fixes: `1840475676` ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.886722080@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:40 +02:00
Thomas Gleixner	73a17232d1	genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP commit 826da771291fc25a428e871f9e7fb465e390f852 upstream. X86 IO/APIC and MSI interrupts (when used without interrupts remapping) require that the affinity setup on startup is done before the interrupt is enabled for the first time as the non-remapped operation mode cannot safely migrate enabled interrupts from arbitrary contexts. Provide a new irq chip flag which allows affected hardware to request this. This has to be opt-in because there have been reports in the past that some interrupt chips cannot handle affinity setting before startup. Fixes: `1840475676` ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.779791738@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 18:54:39 +02:00
Randy Dunlap	c68dba5e2b	x86/tools: Fix objdump version check again [ Upstream commit 839ad22f755132838f406751439363c07272ad87 ] Skip (omit) any version string info that is parenthesized. Warning: objdump version 15) is older than 2.19 Warning: Skipping posttest. where 'objdump -v' says: GNU objdump (GNU Binutils; SUSE Linux Enterprise 15) 2.35.1.20201123-7.18 Fixes: `8bee738bb1` ("x86: Fix objdump version check in chkobjdump.awk for different formats.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210731000146.2720-1-rdunlap@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Pu Lehui	be3a189ff9	powerpc/kprobes: Fix kprobe Oops happens in booke [ Upstream commit 43e8f76006592cb1573a959aa287c45421066f9c ] When using kprobe on powerpc booke series processor, Oops happens as show bellow: / # echo "p:myprobe do_nanosleep" > /sys/kernel/debug/tracing/kprobe_events / # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable / # sleep 1 [ 50.076730] Oops: Exception in kernel mode, sig: 5 [#1] [ 50.077017] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 [ 50.077221] Modules linked in: [ 50.077462] CPU: 0 PID: 77 Comm: sleep Not tainted 5.14.0-rc4-00022-g251a1524293d #21 [ 50.077887] NIP: c0b9c4e0 LR: c00ebecc CTR: 00000000 [ 50.078067] REGS: c3883de0 TRAP: 0700 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.078349] MSR: 00029000 <CE,EE,ME> CR: 24000228 XER: 20000000 [ 50.078675] [ 50.078675] GPR00: c00ebdf0 c3883e90 c313e300 c3883ea0 00000001 00000000 c3883ecc 00000001 [ 50.078675] GPR08: c100598c c00ea250 00000004 00000000 24000222 102490c2 bff4180c 101e60d4 [ 50.078675] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000 [ 50.078675] GPR24: 00000002 00000000 c3883ea0 00000001 00000000 0000c350 3b9b8d50 00000000 [ 50.080151] NIP [c0b9c4e0] do_nanosleep+0x0/0x190 [ 50.080352] LR [c00ebecc] hrtimer_nanosleep+0x14c/0x1e0 [ 50.080638] Call Trace: [ 50.080801] [c3883e90] [c00ebdf0] hrtimer_nanosleep+0x70/0x1e0 (unreliable) [ 50.081110] [c3883f00] [c00ec004] sys_nanosleep_time32+0xa4/0x110 [ 50.081336] [c3883f40] [c001509c] ret_from_syscall+0x0/0x28 [ 50.081541] --- interrupt: c00 at 0x100a4d08 [ 50.081749] NIP: 100a4d08 LR: 101b5234 CTR: 00000003 [ 50.081931] REGS: c3883f50 TRAP: 0c00 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.082183] MSR: 0002f902 <CE,EE,PR,FP,ME> CR: 24000222 XER: 00000000 [ 50.082457] [ 50.082457] GPR00: 000000a2 bf980040 1024b4d0 bf980084 bf980084 64000000 00555345 fefefeff [ 50.082457] GPR08: 7f7f7f7f 101e0000 00000069 00000003 28000422 102490c2 bff4180c 101e60d4 [ 50.082457] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000 [ 50.082457] GPR24: 00000002 bf9803f4 10240000 00000000 00000000 100039e0 00000000 102444e8 [ 50.083789] NIP [100a4d08] 0x100a4d08 [ 50.083917] LR [101b5234] 0x101b5234 [ 50.084042] --- interrupt: c00 [ 50.084238] Instruction dump: [ 50.084483] 4bfffc40 60000000 60000000 60000000 9421fff0 39400402 914200c0 38210010 [ 50.084841] 4bfffc20 00000000 00000000 00000000 <7fe00008> 7c0802a6 7c892378 93c10048 [ 50.085487] ---[ end trace f6fffe98e2fa8f3e ]--- [ 50.085678] Trace/breakpoint trap There is no real mode for booke arch and the MMU translation is always on. The corresponding MSR_IS/MSR_DS bit in booke is used to switch the address space, but not for real mode judgment. Fixes: `21f8b2fa3c` ("powerpc/kprobes: Ignore traps that happened in real mode") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210809023658.218915-1-pulehui@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Ard Biesheuvel	3231e1aeb8	efi/libstub: arm64: Relax 2M alignment again for relocatable kernels [ Upstream commit 3a262423755b83a5f85009ace415d6e7f572dfe8 ] Commit `82046702e2` ("efi/libstub/arm64: Replace 'preferred' offset with alignment check") simplified the way the stub moves the kernel image around in memory before booting it, given that a relocatable image does not need to be copied to a 2M aligned offset if it was loaded on a 64k boundary by EFI. Commit `d32de9130f` ("efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure") inadvertently defeated this logic by overriding the value of efi_nokaslr if EFI_RNG_PROTOCOL is not available, which was mistaken by the loader logic as an explicit request on the part of the user to disable KASLR and any associated relocation of an Image not loaded on a 2M boundary. So let's reinstate this functionality, by capturing the value of efi_nokaslr at function entry to choose the minimum alignment. Fixes: `d32de9130f` ("efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Ard Biesheuvel	9bb6c703b3	efi/libstub: arm64: Force Image reallocation if BSS was not reserved [ Upstream commit 5b94046efb4706b3429c9c8e7377bd8d1621d588 ] Distro versions of GRUB replace the usual LoadImage/StartImage calls used to load the kernel image with some local code that fails to honor the allocation requirements described in the PE/COFF header, as it does not account for the image's BSS section at all: it fails to allocate space for it, and fails to zero initialize it. Since the EFI stub itself is allocated in the .init segment, which is in the middle of the image, its BSS section is not impacted by this, and the main consequence of this omission is that the BSS section may overlap with memory regions that are already used by the firmware. So let's warn about this condition, and force image reallocation to occur in this case, which works around the problem. Fixes: `82046702e2` ("efi/libstub/arm64: Replace 'preferred' offset with alignment check") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Benjamin Herrenschmidt	1d1a897e3f	arm64: efi: kaslr: Fix occasional random alloc (and boot) failure [ Upstream commit 4152433c397697acc4b02c4a10d17d5859c2730d ] The EFI stub random allocator used for kaslr on arm64 has a subtle bug. In function get_entry_num_slots() which counts the number of possible allocation "slots" for the image in a given chunk of free EFI memory, "last_slot" can become negative if the chunk is smaller than the requested allocation size. The test "if (first_slot > last_slot)" doesn't catch it because both first_slot and last_slot are unsigned. I chose not to make them signed to avoid problems if this is ever used on architectures where there are meaningful addresses with the top bit set. Instead, fix it with an additional test against the allocation size. This can cause a boot failure in addition to a loss of randomisation due to another bug in the arm64 stub fixed separately. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Fixes: `2ddbfc81ea` ("efi: stub: add implementation of efi_random_alloc()") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Xie Yongji	d6742e17f6	nbd: Aovid double completion of a request [ Upstream commit cddce01160582a5f52ada3da9626c052d852ec42 ] There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request. To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating. Fixes: `96d97e1782` ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong <jiangyadong@bytedance.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Longpeng(Mike)	f7eb42acb1	vsock/virtio: avoid potential deadlock when vsock device remove [ Upstream commit 49b0b6ffe20c5344f4173f3436298782a08da4f2 ] There's a potential deadlock case when remove the vsock device or process the RESET event: vsock_for_each_connected_socket: spin_lock_bh(&vsock_table_lock) ----------- (1) ... virtio_vsock_reset_sock: lock_sock(sk) --------------------- (2) ... spin_unlock_bh(&vsock_table_lock) lock_sock() may do initiative schedule when the 'sk' is owned by other thread at the same time, we would receivce a warning message that "scheduling while atomic". Even worse, if the next task (selected by the scheduler) try to release a 'sk', it need to request vsock_table_lock and the deadlock occur, cause the system into softlockup state. Call trace: queued_spin_lock_slowpath vsock_remove_bound vsock_remove_sock virtio_transport_release __vsock_release vsock_release __sock_release sock_close __fput ____fput So we should not require sk_lock in this case, just like the behavior in vhost_vsock or vmci. Fixes: `0ea9e1d3a9` ("VSOCK: Introduce virtio_transport.ko") Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Maximilian Heyne	a905bd1c21	xen/events: Fix race in set_evtchn_to_irq [ Upstream commit 88ca2521bd5b4e8b83743c01a2d4cb09325b51e9 ] There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq mapping are lazily allocated in this function. The check whether the row is already present and the row initialization is not synchronized. Two threads can at the same time allocate a new row for evtchn_to_irq and add the irq mapping to the their newly allocated row. One thread will overwrite what the other has set for evtchn_to_irq[row] and therefore the irq mapping is lost. This will trigger a BUG_ON later in bind_evtchn_to_cpu: INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802 INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002) INFO: nvme nvme77: 1/0/0 default/read/poll queues CRIT: kernel BUG at drivers/xen/events/events_base.c:427! WARN: invalid opcode: 0000 [#1] SMP NOPTI WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme] WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0 WARN: Call Trace: WARN: set_affinity_irq+0x121/0x150 WARN: irq_do_set_affinity+0x37/0xe0 WARN: irq_setup_affinity+0xf6/0x170 WARN: irq_startup+0x64/0xe0 WARN: __setup_irq+0x69e/0x740 WARN: ? request_threaded_irq+0xad/0x160 WARN: request_threaded_irq+0xf5/0x160 WARN: ? nvme_timeout+0x2f0/0x2f0 [nvme] WARN: pci_request_irq+0xa9/0xf0 WARN: ? pci_alloc_irq_vectors_affinity+0xbb/0x130 WARN: queue_request_irq+0x4c/0x70 [nvme] WARN: nvme_reset_work+0x82d/0x1550 [nvme] WARN: ? check_preempt_wakeup+0x14f/0x230 WARN: ? check_preempt_curr+0x29/0x80 WARN: ? nvme_irq_check+0x30/0x30 [nvme] WARN: process_one_work+0x18e/0x3c0 WARN: worker_thread+0x30/0x3a0 WARN: ? process_one_work+0x3c0/0x3c0 WARN: kthread+0x113/0x130 WARN: ? kthread_park+0x90/0x90 WARN: ret_from_fork+0x3a/0x50 This patch sets evtchn_to_irq rows via a cmpxchg operation so that they will be set only once. The row is now cleared before writing it to evtchn_to_irq in order to not create a race once the row is visible for other threads. While at it, do not require the page to be zeroed, because it will be overwritten with -1's in clear_evtchn_to_irq_row anyway. Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Fixes: `d0b075ffee` ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated") Link: https://lore.kernel.org/r/20210812130930.127134-1-mheyne@amazon.de Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Matt Roper	194473495c	drm/i915: Only access SFC_DONE when media domain is not fused off [ Upstream commit 24d032e2359e3abc926b3d423f49a7c33e0b7836 ] The SFC_DONE register lives within the corresponding VD0/VD2/VD4/VD6 forcewake domain and is not accessible if the vdbox in that domain is fused off and the forcewake is not initialized. This mistake went unnoticed because until recently we were using the wrong register offset for the SFC_DONE register; once the register offset was corrected, we started hitting errors like <4> [544.989065] i915 0000:cc:00.0: Uninitialized forcewake domain(s) 0x80 accessed at 0x1ce000 on parts with fused-off vdbox engines. Fixes: `e50dbdbfd9` ("drm/i915/tgl: Add SFC instdone to error state") Fixes: 9c9c6d0ab08a ("drm/i915: Correct SFC_DONE register offset") Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210806174130.1058960-1-matthew.d.roper@intel.com Reviewed-by: José Roberto de Souza <jose.souza@intel.com> (cherry picked from commit c5589bb5dccb0c5cb74910da93663f489589f3ce) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Changed Fixes tag to match the cherry-picked 82929a2140eb] Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Eric Dumazet	ad42f5a420	net: igmp: increase size of mr_ifc_count [ Upstream commit b69dd5b3780a7298bd893816a09da751bc0636f7 ] Some arches support cmpxchg() on 4-byte and 8-byte only. Increase mr_ifc_count width to 32bit to fix this problem. Fixes: 4a2b285e7e10 ("net: igmp: fix data-race in igmp_ifc_timer_expire()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210811195715.3684218-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Neal Cardwell	f5c79ec040	tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets [ Upstream commit 6de035fec045f8ae5ee5f3a02373a18b939e91fb ] Currently if BBR congestion control is initialized after more than 2B packets have been delivered, depending on the phase of the tp->delivered counter the tracking of BBR round trips can get stuck. The bug arises because if tp->delivered is between 2^31 and 2^32 at the time the BBR congestion control module is initialized, then the initialization of bbr->next_rtt_delivered to 0 will cause the logic to believe that the end of the round trip is still billions of packets in the future. More specifically, the following check will fail repeatedly: !before(rs->prior_delivered, bbr->next_rtt_delivered) and thus the connection will take up to 2B packets delivered before that check will pass and the connection will set: bbr->round_start = 1; This could cause many mechanisms in BBR to fail to trigger, for example bbr_check_full_bw_reached() would likely never exit STARTUP. This bug is 5 years old and has not been observed, and as a practical matter this would likely rarely trigger, since it would require transferring at least 2B packets, or likely more than 3 terabytes of data, before switching congestion control algorithms to BBR. This patch is a stable candidate for kernels as far back as v4.9, when tcp_bbr.c was added. Fixes: `0f8782ea14` ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Kevin Yang <yyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:39 +02:00
Willy Tarreau	48119ffba8	net: linkwatch: fix failure to restore device state across suspend/resume [ Upstream commit 6922110d152e56d7569616b45a1f02876cf3eb9f ] After migrating my laptop from 4.19-LTS to 5.4-LTS a while ago I noticed that my Ethernet port to which a bond and a VLAN interface are attached appeared to remain up after resuming from suspend with the cable unplugged (and that problem still persists with 5.10-LTS). It happens that the following happens: - the network driver (e1000e here) prepares to suspend, calls e1000e_down() which calls netif_carrier_off() to signal that the link is going down. - netif_carrier_off() adds a link_watch event to the list of events for this device - the device is completely stopped. - the machine suspends - the cable is unplugged and the machine brought to another location - the machine is resumed - the queued linkwatch events are processed for the device - the device doesn't yet have the __LINK_STATE_PRESENT bit and its events are silently dropped - the device is resumed with its link down - the upper VLAN and bond interfaces are never notified that the link had been turned down and remain up - the only way to provoke a change is to physically connect the machine to a port and possibly unplug it. The state after resume looks like this: $ ip -br li \| egrep 'bond\|eth' bond0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> eth0 DOWN e8:6a:64:64:64:64 <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> eth0.2@eth0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> Placing an explicit call to netdev_state_change() either in the suspend or the resume code in the NIC driver worked around this but the solution is not satisfying. The issue in fact really is in link_watch that loses events while it ought not to. It happens that the test for the device being present was added by commit `124eee3f69` ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") in 4.20 to avoid an access to devices that are not present. Instead of dropping events, this patch proceeds slightly differently by postponing their handling so that they happen after the device is fully resumed. Fixes: `124eee3f69` ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") Link: https://lists.openwall.net/netdev/2018/03/15/62 Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20210809160628.22623-1-w@1wt.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:25 +02:00
Yang Yingliang	6f7858d77f	net: bridge: fix memleak in br_add_if() [ Upstream commit 519133debcc19f5c834e7e28480b60bdc234fe02 ] I got a memleak report: BUG: memory leak unreferenced object 0x607ee521a658 (size 240): comm "syz-executor.0", pid 955, jiffies 4294780569 (age 16.449s) hex dump (first 32 bytes, cpu 1): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d830ea5a>] br_multicast_add_port+0x1c2/0x300 net/bridge/br_multicast.c:1693 [<00000000274d9a71>] new_nbp net/bridge/br_if.c:435 [inline] [<00000000274d9a71>] br_add_if+0x670/0x1740 net/bridge/br_if.c:611 [<0000000012ce888e>] do_set_master net/core/rtnetlink.c:2513 [inline] [<0000000012ce888e>] do_set_master+0x1aa/0x210 net/core/rtnetlink.c:2487 [<0000000099d1cafc>] __rtnl_newlink+0x1095/0x13e0 net/core/rtnetlink.c:3457 [<00000000a01facc0>] rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3488 [<00000000acc9186c>] rtnetlink_rcv_msg+0x369/0xa10 net/core/rtnetlink.c:5550 [<00000000d4aabb9c>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504 [<00000000bc2e12a3>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<00000000bc2e12a3>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340 [<00000000e4dc2d0e>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929 [<000000000d22c8b3>] sock_sendmsg_nosec net/socket.c:654 [inline] [<000000000d22c8b3>] sock_sendmsg+0x139/0x170 net/socket.c:674 [<00000000e281417a>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350 [<00000000237aa2ab>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404 [<000000004f2dc381>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433 [<0000000005feca6c>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47 [<000000007304477d>] entry_SYSCALL_64_after_hwframe+0x44/0xae On error path of br_add_if(), p->mcast_stats allocated in new_nbp() need be freed, or it will be leaked. Fixes: `1080ab95e3` ("net: bridge: add support for IGMP/MLD stats and export them via netlink") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210809132023.978546-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 18:54:25 +02:00

... 2 3 4 5 6 ...

976849 Commits