linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-14 23:56:48 +07:00

Author	SHA1	Message	Date
Linus Torvalds	d324810acd	for-linus-2019-11-21 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXdZ1DwAKCRCRxhvAZXjc orgYAP4k65bWhN5w3Qm60Wf+azD/mcDknhIhjsu3d0B4RnIfuwD8CytoLMgwLDm0 ZWbFscczIO3iv3KYuZEnC6j5PgyPtAs= =r8Sd -----END PGP SIGNATURE----- Merge tag 'for-linus-2019-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd fixlet from Christian Brauner: "This contains a simple fix for the pidfd poll method. In the original patchset pidfd_poll() was made to return an unsigned int. However, the poll method is defined to return a __poll_t. While the unsigned int is not a huge deal it's just nicer to return a __poll_t. I've decided to send it right before the 5.4 release mainly so that stable doesn't need to backport it to both 5.4 and 5.3" * tag 'for-linus-2019-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: fork: fix pidfd_poll()'s return type	2019-11-21 11:51:49 -08:00
Oliver Neukum	5f9f0b11f0	nfc: port100: handle command failure cleanly If starting the transfer of a command suceeds but the transfer for the reply fails, it is not enough to initiate killing the transfer for the command may still be running. You need to wait for the killing to finish before you can reuse URB and buffer. Reported-and-tested-by: syzbot+711468aa5c3a1eabf863@syzkaller.appspotmail.com Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:48:17 -08:00
Xin Long	1841b98299	lwtunnel: check erspan options before allocating tun_info As Jakub suggested on another patch, it's better to do the check on erspan options before allocating memory. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:47:39 -08:00
Xin Long	7b6a70f737	lwtunnel: be STRICT to validate the new LWTUNNEL_IP(6)_OPTS LWTUNNEL_IP(6)_OPTS are the new items in ip(6)_tun_policy, which are parsed by nla_parse_nested_deprecated(). We should check it strictly by setting .strict_start_type = LWTUNNEL_IP(6)_OPTS. This patch also adds missing LWTUNNEL_IP6_OPTS in ip6_tun_policy. Fixes: `4ece477870` ("lwtunnel: add options setting and dumping for geneve") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:47:23 -08:00
Xin Long	f3bed7f8f9	net: remove the unnecessary strict_start_type in some policies ct_policy and mpls_policy are parsed with nla_parse_nested(), which does NL_VALIDATE_STRICT validation, strict_start_type is not needed to set as it is actually trying to make some attributes parsed with NL_VALIDATE_STRICT. This patch is to remove it, and do the same on rtm_nh_policy which is parsed by nlmsg_parse(). Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:46:18 -08:00
David S. Miller	ff998a80c3	Merge branch 'net-sched-support-vxlan-and-erspan-options' Xin Long says: ==================== net: sched: support vxlan and erspan options This patchset is to add vxlan and erspan options support in cls_flower and act_tunnel_key. The form is pretty much like geneve_opts in: https://patchwork.ozlabs.org/patch/935272/ https://patchwork.ozlabs.org/patch/954564/ but only one option is allowed for vxlan and erspan. v1->v2: - see each patch changelog. ==================== Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:44:06 -08:00
Xin Long	79b1011cb3	net: sched: allow flower to match erspan options This patch is to allow matching options in erspan. The options can be described in the form: VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK. When ver is set to 1, index will be applied while dir and hwid will be ignored, and when ver is set to 2, dir and hwid will be used while index will be ignored. Different from geneve, only one option can be set. And also, geneve options, vxlan options or erspan options can't be set at the same time. # ip link add name erspan1 type erspan external # tc qdisc add dev erspan1 ingress # tc filter add dev erspan1 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ erspan_opts 1:12:0:0/1:ffff:0:0 \ ip_proto udp \ action mirred egress redirect dev eth0 v1->v2: - improve some err msgs of extack. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:44:06 -08:00
Xin Long	d8f9dfae49	net: sched: allow flower to match vxlan options This patch is to allow matching gbp option in vxlan. The options can be described in the form GBP/GBP_MASK, where GBP is represented as a 32bit hexadecimal value. Different from geneve, only one option can be set. And also, geneve options and vxlan options can't be set at the same time. # ip link add name vxlan0 type vxlan dstport 0 external # tc qdisc add dev vxlan0 ingress # tc filter add dev vxlan0 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ vxlan_opts 01020304/ffffffff \ ip_proto udp \ action mirred egress redirect dev eth0 v1->v2: - add .strict_start_type for enc_opts_policy as Jakub noticed. - use Duplicate instead of Wrong in err msg for extack as Jakub suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:44:06 -08:00
Xin Long	e20d4ff2ac	net: sched: add erspan option support to act_tunnel_key This patch is to allow setting erspan options using the act_tunnel_key action. Different from geneve options, only one option can be set. And also, geneve options, vxlan options or erspan options can't be set at the same time. Options are expressed as ver:index:dir:hwid, when ver is set to 1, index will be applied while dir and hwid will be ignored, and when ver is set to 2, dir and hwid will be used while index will be ignored. # ip link add name erspan1 type erspan external # tc qdisc add dev eth0 ingress # tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 \ ip_proto udp \ action tunnel_key \ set src_ip 10.0.99.192 \ dst_ip 10.0.99.193 \ dst_port 6081 \ id 11 \ erspan_opts 1:2:0:0 \ action mirred egress redirect dev erspan1 v1->v2: - do the validation when dst is not yet allocated as Jakub suggested. - use Duplicate instead of Wrong in err msg for extack. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:44:06 -08:00
Xin Long	fca3f91cc3	net: sched: add vxlan option support to act_tunnel_key This patch is to allow setting vxlan options using the act_tunnel_key action. Different from geneve options, only one option can be set. And also, geneve options and vxlan options can't be set at the same time. gbp is the only param for vxlan options: # ip link add name vxlan0 type vxlan dstport 0 external # tc qdisc add dev eth0 ingress # tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 \ ip_proto udp \ action tunnel_key \ set src_ip 10.0.99.192 \ dst_ip 10.0.99.193 \ dst_port 6081 \ id 11 \ vxlan_opts 01020304 \ action mirred egress redirect dev vxlan0 v1->v2: - add .strict_start_type for enc_opts_policy as Jakub noticed. - use Duplicate instead of Wrong in err msg for extack as Jakub suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:44:06 -08:00
Dan Carpenter	0617aa988d	octeontx2-af: Fix uninitialized variable in debugfs If rvu_get_blkaddr() fails, then this rvu_cgx_nix_cuml_stats() returns zero and we write some uninitialized data into the debugfs output. On the error paths, the use of the uninitialized "*stat" is harmless, but it will lead to a Smatch warning (static analysis) and a UBSan warning (runtime analysis) so we should prevent that as well. Fixes: `f967488d09` ("octeontx2-af: Add per CGX port level NIX Rx/Tx counters") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:42:19 -08:00
Stefano Garzarella	039fcccaed	vsock: avoid to assign transport if its initialization fails If transport->init() fails, we can't assign the transport to the socket, because it's not initialized correctly, and any future calls to the transport callbacks would have an unexpected behavior. Fixes: `c0cfa2d8a7` ("vsock: add multi-transports support") Reported-and-tested-by: syzbot+e2e5c07bf353b2f79daa@syzkaller.appspotmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:37:16 -08:00
Markus Theil	05d6c8cfdb	mt76: fix fix ampdu locking The current ampdu locking code does not unlock its mutex in the early return case. This patch fixes it. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Acked-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>	2019-11-21 20:38:30 +02:00
Peter Zijlstra	8954290765	x86/entry/32: Fix NMI vs ESPFIX When the NMI lands on an ESPFIX_SS, we are on the entry stack and must swizzle, otherwise we'll run do_nmi() on the entry stack, which is BAD. Also, similar to the normal exception path, we need to correct the ESPFIX magic before leaving the entry stack, otherwise pt_regs will present a non-flat stack pointer. Tested by running sigreturn_32 concurrent with perf-record. Fixes: `e5862d0515` ("x86/entry/32: Leave the kernel via trampoline stack") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@kernel.org	2019-11-21 19:37:44 +01:00
Andy Lutomirski	a1a338e5b6	x86/entry/32: Unwind the ESPFIX stack earlier on exception entry Right now, we do some fancy parts of the exception entry path while SS might have a nonzero base: we fill in regs->ss and regs->sp, and we consider switching to the kernel stack. This results in regs->ss and regs->sp referring to a non-flat stack and it may result in overflowing the entry stack. The former issue means that we can try to call iret_exc on a non-flat stack, which doesn't work. Tested with selftests/x86/sigreturn_32. Fixes: `45d7b25574` ("x86/entry/32: Enter the kernel via trampoline stack") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org	2019-11-21 19:37:44 +01:00
Andy Lutomirski	82cb8a0b1d	x86/entry/32: Move FIXUP_FRAME after pushing %fs in SAVE_ALL This will allow us to get percpu access working before FIXUP_FRAME, which will allow us to unwind ESPFIX earlier. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org	2019-11-21 19:37:43 +01:00
Andy Lutomirski	4c4fd55d3d	x86/entry/32: Use %ss segment where required When re-building the IRET frame we use %eax as an destination %esp, make sure to then also match the segment for when there is a nonzero SS base (ESPFIX). [peterz: Changelog and minor edits] Fixes: `3c88c692c2` ("x86/stackframe/32: Provide consistent pt_regs") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org	2019-11-21 19:37:43 +01:00
Peter Zijlstra	40ad219958	x86/entry/32: Fix IRET exception As reported by Lai, the commit `3c88c692c2` ("x86/stackframe/32: Provide consistent pt_regs") wrecked the IRET EXTABLE entry by making .Lirq_return not point at IRET. Fix this by placing IRET_FRAME in RESTORE_REGS, to mirror how FIXUP_FRAME is part of SAVE_ALL. Fixes: `3c88c692c2` ("x86/stackframe/32: Provide consistent pt_regs") Reported-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@kernel.org	2019-11-21 19:37:43 +01:00
Thomas Gleixner	880a98c339	x86/cpu_entry_area: Add guard page for entry stack on 32bit The entry stack in the cpu entry area is protected against overflow by the readonly GDT on 64-bit, but on 32-bit the GDT needs to be writeable and therefore does not trigger a fault on stack overflow. Add a guard page. Fixes: `c482feefe1` ("x86/entry/64: Make cpu_entry_area.tss read-only") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org	2019-11-21 19:37:43 +01:00
Thomas Gleixner	f490e07c53	x86/pti/32: Size initial_page_table correctly Commit `945fd17ab6` ("x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table") introduced the sync for the initial page table for 32bit. sync_initial_page_table() uses clone_pgd_range() which does the update for the kernel page table. If PTI is enabled it also updates the user space page table counterpart, which is assumed to be in the next page after the target PGD. At this point in time 32-bit did not have PTI support, so the user space page table update was not taking place. The support for PTI on 32-bit which was introduced later on, did not take that into account and missed to add the user space counter part for the initial page table. As a consequence sync_initial_page_table() overwrites any data which is located in the page behing initial_page_table causing random failures, e.g. by corrupting doublefault_tss and wreckaging the doublefault handler on 32bit. Fix it by adding a "user" page table right after initial_page_table. Fixes: `7757d607c6` ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Joerg Roedel <jroedel@suse.de> Cc: stable@kernel.org	2019-11-21 19:37:43 +01:00
Andy Lutomirski	3580d0b29c	x86/doublefault/32: Fix stack canaries in the double fault handler The double fault TSS was missing GS setup, which is needed for stack canaries to work. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org	2019-11-21 19:37:42 +01:00
Navid Emamdoost	03bf73c315	nbd: prevent memory leak In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the reallocted memory is leak. The correct behaviour should be assigning the reallocted memory to config->socks right after success. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-21 11:33:41 -07:00
Jens Axboe	866ca95da5	Merge branch 'nvme-5.5' of git://git.infradead.org/nvme into for-5.5/drivers-post Pull NVMe changes from Keith: "- The only new feature is the optional hwmon support for nvme (Guenter and Akinobu) - A universal work-around for controllers reading discard payloads beyond the range boundary (Eduard) - Chaitanya graciously agreed to share the target driver maintenance" * 'nvme-5.5' of git://git.infradead.org/nvme: nvme: hwmon: add quirk to avoid changing temperature threshold nvme: hwmon: provide temperature min and max values for each sensor nvmet: add another maintainer nvme: Discard workaround for non-conformant devices nvme: Add hardware monitoring support	2019-11-21 10:53:47 -07:00
Davidlohr Bueso	7f264dab5b	x86/mm/pat: Rename pat_rbtree.c to pat_interval.c Considering the previous changes, this is a more proper name. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20191121011601.20611-5-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-21 18:48:18 +01:00
Davidlohr Bueso	511aaca834	x86/mm/pat: Drop the rbt_ prefix from external memtype calls Drop the rbt_memtype_*() call rbt_ prefix, as we no longer use an rbtree directly. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20191121011601.20611-4-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-21 18:48:07 +01:00
Davidlohr Bueso	6a9930b1c5	x86/mm/pat: Do not pass 'rb_root' down the memtype tree helper functions Get rid of the passing the rb_root down the helper calls; there is only one: &memtype_rbroot. No change in functionality. [ mingo: Fixed the changelog which described a different version of the patch. ] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20191121011601.20611-3-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-21 18:47:59 +01:00
Davidlohr Bueso	8d04a5f97a	x86/mm/pat: Convert the PAT tree to a generic interval tree With some considerations, the custom pat_rbtree implementation can be simplified to use most of the generic interval_tree machinery: - The tree inorder traversal can slightly differ when there are key ('start') collisions in the tree due to one going left and another right. This, however, only affects the output of debugfs' pat_memtype_list file. - Generic interval trees are now fully closed [a, b], for which we need to adjust the last endpoint (ie: end - 1). - Erasing logic must remain untouched as well. - In order for the types to remain u64, the 'memtype_interval' calls are introduced, as opposed to simply using struct interval_tree. In addition, the PAT tree might potentially also benefit by the fast overlap detection for the insertion case when looking up the first overlapping node in the tree. No change in behavior is intended. Finally, I've tested this on various servers, via sanity warnings, running side by side with the current version and so far see no differences in the returned pointer node when doing memtype_rb_lowest_match() lookups. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20191121011601.20611-2-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-21 18:47:30 +01:00
Akinobu Mita	6c6aa2f26c	nvme: hwmon: add quirk to avoid changing temperature threshold This adds a new quirk NVME_QUIRK_NO_TEMP_THRESH_CHANGE to avoid changing the value of the temperature threshold feature for specific devices that show undesirable behavior. Guenter reported: "On my Intel NVME drive (SSDPEKKW512G7), writing any minimum limit on the Composite temperature sensor results in a temperature warning, and that warning is sticky until I reset the controller. It doesn't seem to matter which temperature I write; writing -273000 has the same result." The Intel NVMe has the latest firmware version installed, so this isn't a problem that was ever fixed. Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jean Delvare <jdelvare@suse.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2019-11-22 02:21:08 +09:00
Akinobu Mita	52deba0f02	nvme: hwmon: provide temperature min and max values for each sensor According to the NVMe specification, the over temperature threshold and under temperature threshold features shall be implemented for Composite Temperature if a non-zero WCTEMP field value is reported in the Identify Controller data structure. The features are also implemented for all implemented temperature sensors (i.e., all Temperature Sensor fields that report a non-zero value). This provides the over temperature threshold and under temperature threshold for each sensor as temperature min and max values of hwmon sysfs attributes. The WCTEMP is already provided as a temperature max value for Composite Temperature, but this change isn't incompatible. Because the default value of the over temperature threshold for Composite Temperature is the WCTEMP. Now the alarm attribute for Composite Temperature indicates one of the temperature is outside of a temperature threshold. Because there is only a single bit in Critical Warning field that indicates a temperature is outside of a threshold. Example output from the "sensors" command: nvme-pci-0100 Adapter: PCI adapter Composite: +33.9°C (low = -273.1°C, high = +69.8°C) (crit = +79.8°C) Sensor 1: +34.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 5: +47.9°C (low = -273.1°C, high = +65261.8°C) This also adds helper macros for kelvin from/to milli Celsius conversion, and replaces the repeated code in hwmon.c. Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jean Delvare <jdelvare@suse.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2019-11-22 02:21:08 +09:00
Christoph Hellwig	3aeb6a24f1	nvmet: add another maintainer Sagi and I have been pretty busy lately, and Chaitanya has been helping a lot with target work and agreed to share the load. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2019-11-22 02:21:05 +09:00
Jens Axboe	1e279153df	Revert "block: split bio if the only bvec's length is > SZ_4K" We really don't need this, as the slow path will do the right thing anyway. This reverts commit `6952a7f844`. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-21 10:16:12 -07:00
Konstantin Khlebnikov	b686631865	block: add iostat counters for flush requests Requests that triggers flushing volatile writeback cache to disk (barriers) have significant effect to overall performance. Block layer has sophisticated engine for combining several flush requests into one. But there is no statistics for actual flushes executed by disk. Requests which trigger flushes usually are barriers - zero-size writes. This patch adds two iostat counters into /sys/class/block/$dev/stat and /proc/diskstats - count of completed flush requests and their total time. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-21 09:06:47 -07:00
Adrian Hunter	98dcf14d7f	perf tools: Add kernel AUX area sampling definitions Add kernel AUX area sampling definitions, which brings perf_event.h into line with the kernel version. New sample type PERF_SAMPLE_AUX requests a sample of the AUX area buffer. New perf_event_attr member 'aux_sample_size' specifies the desired size of the sample. Also add support for parsing samples containing AUX area data i.e. PERF_SAMPLE_AUX. Committer notes: I squashed the first two patches in this series to avoid breaking automatic bisection, i.e. after applying only the original first patch in this series we would have: # perf test -v parsing 26: Sample parsing : --- start --- test child forked, pid 17018 sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating test child finished with -1 ---- end ---- Sample parsing: FAILED! # With the two paches combined: # perf test parsing 26: Sample parsing : Ok # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-21 10:54:20 -03:00
Paolo Bonzini	c50d8ae3a1	KVM: x86: create mmu/ subdirectory Preparatory work for shattering mmu.c into multiple files. Besides making it easier to follow, this will also make it possible to write unit tests for various parts. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-11-21 12:03:50 +01:00
Liran Alon	0155b2b91b	KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use of the INVVPID/INVEPT Instruction, the hypervisor needs to execute INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and either: "Virtualize APIC accesses" VM-execution control was changed from 0 to 1, OR the value of apic_access_page was changed. In the nested case, the burden falls on L1, unless L0 enables EPT in vmcs02 but L1 enables neither EPT nor VPID in vmcs12. For this reason prepare_vmcs02() and load_vmcs12_host_state() have special code to request a TLB flush in case L1 does not use EPT but it uses "virtualize APIC accesses". This special case however is not necessary. On a nested vmentry the physical TLB will already be flushed except if all the following apply: * L0 uses VPID * L1 uses VPID * L0 can guarantee TLB entries populated while running L1 are tagged differently than TLB entries populated while running L2. If the first condition is false, the processor will flush the TLB on vmentry to L2. If the second or third condition are false, prepare_vmcs02() will request KVM_REQ_TLB_FLUSH. However, even if both are true, no extra TLB flush is needed to handle the APIC access page: * if L1 doesn't use VPID, the second condition doesn't hold and the TLB will be flushed anyway. * if L1 uses VPID, it has to flush the TLB itself with INVVPID and section 28.3.3.3 doesn't apply to L0. * even INVEPT is not needed because, if L0 uses EPT, it uses different EPTP when running L2 than L1 (because guest_mode is part of mmu-role). In this case SDM section 28.3.3.4 doesn't apply. Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(), one could note that L0 won't flush TLB only in cases where SDM sections 28.3.3.3 and 28.3.3.4 don't apply. In particular, if L0 uses different VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section 28.3.3.3 doesn't apply. Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit(). Side-note: This patch can be viewed as removing parts of commit `fb6c819843` ("kvm: vmx: Flush TLB when the APIC-access address changes”) that is not relevant anymore since commit `1313cc2bd8` ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”). i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT, then L0 will use same EPTP for both L0 and L1. Which indeed required L0 to execute INVEPT before entering L2 guest. This assumption is not true anymore since when guest_mode was added to mmu-role. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-11-21 12:03:50 +01:00
Mao Wenan	db5a95ec16	KVM: x86: remove set but not used variable 'called' Fixes gcc '-Wunused-but-set-variable' warning: arch/x86/kvm/x86.c: In function kvm_make_scan_ioapic_request_mask: arch/x86/kvm/x86.c:7911:7: warning: variable called set but not used [-Wunused-but-set-variable] It is not used since commit `7ee30bc132` ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Mao Wenan <maowenan@huawei.com> Fixes: `7ee30bc132` ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-11-21 12:03:49 +01:00
Liran Alon	b11494bcab	KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning vmcs->apic_access_page is simply a token that the hypervisor puts into the PFN of a 4KB EPTE (or PTE if using shadow-paging) that triggers APIC-access VMExit or APIC virtualization logic whenever a CPU running in VMX non-root mode read/write from/to this PFN. As every write either triggers an APIC-access VMExit or write is performed on vmcs->virtual_apic_page, the PFN pointed to by vmcs->apic_access_page should never actually be touched by CPU. Therefore, there is no need to mark vmcs02->apic_access_page as dirty after unpin it on L2->L1 emulated VMExit or when L1 exit VMX operation. Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-11-21 12:03:48 +01:00
Paolo Bonzini	46f4f0aabc	Merge branch 'kvm-tsx-ctrl' into HEAD Conflicts: arch/x86/kvm/vmx/vmx.c	2019-11-21 12:03:40 +01:00
Paolo Bonzini	b07a5c53d4	KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it If X86_FEATURE_RTM is disabled, the guest should not be able to access MSR_IA32_TSX_CTRL. We can therefore use it in KVM to force all transactions from the guest to abort. Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-11-21 10:01:02 +01:00
Paolo Bonzini	c11f83e062	KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality The current guest mitigation of TAA is both too heavy and not really sufficient. It is too heavy because it will cause some affected CPUs (those that have MDS_NO but lack TAA_NO) to fall back to VERW and get the corresponding slowdown. It is not really sufficient because it will cause the MDS_NO bit to disappear upon microcode update, so that VMs started before the microcode update will not be runnable anymore afterwards, even with tsx=on. Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for the guest and let it run without the VERW mitigation. Even though MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write it on every vmentry, we can use the shared MSR functionality because the host kernel need not protect itself from TSX-based side-channels. Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-11-21 10:00:59 +01:00
Paolo Bonzini	edef5c36b0	KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID Because KVM always emulates CPUID, the CPUID clear bit (bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually" by the hypervisor when performing said emulation. Right now neither kvm-intel.ko nor kvm-amd.ko implement MSR_IA32_TSX_CTRL but this will change in the next patch. Reviewed-by: Jim Mattson <jmattson@google.com> Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-11-21 09:59:31 +01:00
Paolo Bonzini	de1fca5d6e	KVM: x86: do not modify masked bits of shared MSRs "Shared MSRs" are guest MSRs that are written to the host MSRs but keep their value until the next return to userspace. They support a mask, so that some bits keep the host value, but this mask is only used to skip an unnecessary MSR write and the value written to the MSR is always the guest MSR. Fix this and, while at it, do not update smsr->values[slot].curr if for whatever reason the wrmsr fails. This should only happen due to reserved bits, so the value written to smsr->values[slot].curr will not match when the user-return notifier and the host value will always be restored. However, it is untidy and in rare cases this can actually avoid spurious WRMSRs on return to userspace. Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-11-21 09:59:22 +01:00
Paolo Bonzini	cbbaa2727a	KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented to the guests. It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR && !RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not hidden (it actually was), yet the value says that TSX is not vulnerable to microarchitectural data sampling. Fix both. Cc: stable@vger.kernel.org Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-11-21 09:59:10 +01:00
Paolo Bonzini	14edff8831	KVM/arm updates for Linux 5.5: - Allow non-ISV data aborts to be reported to userspace - Allow injection of data aborts from userspace - Expose stolen time to guests - GICv4 performance improvements - vgic ITS emulation fixes - Simplify FWB handling - Enable halt pool counters - Make the emulated timer PREEMPT_RT compliant -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl3VZ0EPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDwikQAJ/lQT97zEKV3dnpD/jjmEic/3QvTGljS4p+ pbwZAzyoSMc09lAK2pkaGRRc7euPnp4uLdRS4SenToVzmUQCzuxpQEEMdV/wjp4V WLQ1WnTEAhYkm7k5MVo4uy3eD7nVWHWXgfQJvzL4EYZ5R/gd9NzBrnAc6LLV6hp9 0eLXIYrGFa1GESzF6P6sBDJhYpqVUcQlTI8I43kZH3iCC4+OsBxIkhHREZYsELhW MZJIM9ZCskg2tvPC4UysaFiGjBYUJNJ0V+fFOrhyGzludP8i8rNRwgA60mznwvNw V4N6/gLlkGK7nLqP+noUcU2wTBnIu389TcWWsX47CuDUzawCd8Fb9kX2zYauQSyS ujE0uzoo/nhPFysh9OVVeLUZ6o/wotXoMp2t32t1c5h9N1hISEJvAWavMxTY6KzF NEn9hWFjNcgBoArz9GKn9p2nBQpCDvu+2SlI4nL/qgZ7lPC4O3U1uq9myCOLj/gu Can/u5EAwgyIBDVcEPHV+vP2GjyeERdXprGiG2VJTYlbHsdjgISTR+5Fy32KdGlP YygeZxJtzretr3AYsWqD6Mri30FDSoYy9rUOWBpa+ZHbJPac0M+uKOqntV1OxPX9 QUkmNEdJcDr8fkcKxnEZ/MaZxFTGPp4vfhiT4A7dUkWTFq7ajvGo8IwN1d7PvWxS LFMij1Js =SxBH -----END PGP SIGNATURE----- Merge tag 'kvmarm-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for Linux 5.5: - Allow non-ISV data aborts to be reported to userspace - Allow injection of data aborts from userspace - Expose stolen time to guests - GICv4 performance improvements - vgic ITS emulation fixes - Simplify FWB handling - Enable halt pool counters - Make the emulated timer PREEMPT_RT compliant Conflicts: include/uapi/linux/kvm.h	2019-11-21 09:58:35 +01:00
Chris Wilson	71d122629c	drm/i915/fbdev: Restore physical addresses for fb_mmap() fbdev uses the physical address of our framebuffer for its fb_mmap() routine. While we need to adapt this address for the new io BAR, we have to fix v5.4 first! The simplest fix is to restore the smem back to v5.3 and we will then probably have to implement our fbops->fb_mmap() callback to handle local memory. Reported-by: Neil MacLeod <freedesktop@nmacleod.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112256 Fixes: `5f889b9a61` ("drm/i915: Disregard drm_mode_config.fb_base") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Tested-by: Neil MacLeod <freedesktop@nmacleod.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191113180633.3947-1-chris@chris-wilson.co.uk (cherry picked from commit `abc5520704`) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (cherry picked from commit `9faf5fa4d3`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2019-11-21 00:09:22 -08:00
Mohammad Rasim	1199ab4c9e	Bluetooth: btbcm: Add entry for BCM4335A0 UART bluetooth This patch adds the device ID for the BCM4335A0 module (part of the AMPAK AP6335 WIFI/Bluetooth combo) hciconfig output: ``` hci1: Type: Primary Bus: UART BD Address: 43:35:B0:07:1F:AC ACL MTU: 1021:8 SCO MTU: 64:1 UP RUNNING RX bytes:5079 acl:0 sco:0 events:567 errors:0 TX bytes:69065 acl:0 sco:0 commands:567 errors:0 Features: 0xbf 0xfe 0xcf 0xff 0xdf 0xff 0x7b 0x87 Packet type: DM1 DM3 DM5 DH1 DH3 DH5 HV1 HV2 HV3 Link policy: RSWITCH SNIFF Link mode: SLAVE ACCEPT Name: 'alarm' Class: 0x000000 Service Classes: Unspecified Device Class: Miscellaneous, HCI Version: 4.0 (0x6) Revision: 0x161 LMP Version: 4.0 (0x6) Subversion: 0x4106 Manufacturer: Broadcom Corporation (15) ``` Signed-off-by: Mohammad Rasim <mohammad.rasim96@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-11-21 08:10:46 +01:00
Mohammad Rasim	e32ec8ea0d	dt-bindings: net: Add compatible for BCM4335A0 bluetooth Available in the Ampak AP6335 WiFi/Bluetooth combo Signed-off-by: Mohammad Rasim <mohammad.rasim96@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-11-21 08:10:46 +01:00
Frederic Weisbecker	8392853e96	rackmeter: Use vtime aware kcpustat accessor Now that we have a vtime safe kcpustat accessor, use it to fetch CPUTIME_NICE and fix frozen kcpustat values on nohz_full CPUs. Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpengli@tencent.com> Link: https://lkml.kernel.org/r/20191121024430.19938-7-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-21 07:59:00 +01:00
Frederic Weisbecker	8688f2fb67	leds: Use all-in-one vtime aware kcpustat accessor We can now safely read user kcpustat fields on nohz_full CPUs. Use the appropriate accessor. [ mingo: Fixed build failure. ] Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> (maintainer:LED SUBSYSTEM) Cc: Pavel Machek <pavel@ucw.cz> (maintainer:LED SUBSYSTEM) Cc: Dan Murphy <dmurphy@ti.com> (reviewer:LED SUBSYSTEM) Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpengli@tencent.com> Link: https://lkml.kernel.org/r/20191121024430.19938-6-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-21 07:58:48 +01:00
Chuhong Yuan	e2c0567597	pcmcia: Use dev_get_drvdata where possible Instead of using to_pci_dev + pci_get_drvdata, use dev_get_drvdata to make code simpler. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2019-11-21 07:51:46 +01:00

... 7 8 9 10 11 ...

878708 Commits