linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-21 02:01:05 +07:00

Author	SHA1	Message	Date
Guojia Liao	e4b806edfa	net: hns3: cleanup a format-truncation warning In hns3_nic_init_irq(), when '_int_idx' has more than 9 digits and the length of netdev's name is IFNAMSIZ, the total length of final name will be bigger the HNAE3_INT_NAME_LEN - 1, even though '_int_idx' will never have such large value, but the compiler gives a format-truncation warning for this case. So this patch just enlarges the length to avoid this warning. Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 12:03:23 -07:00
Guangbin Huang	db4d3d554e	net: hns3: cleanup some coding style issues To unify code style and make code simpler, this patch modifies some code, deletes unnecessary blank lines and {}, changes location of code, and so on. No functional change. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 12:03:23 -07:00
Guojia Liao	d6ad7c5306	net: hns3: cleanup some magic numbers To make the code more readable, this patch replaces some magic numbers with macro or sizeof operation. Also uses macro lower_32_bits and upper_32_bits to get bits 0-31 and 32-63 of a number, instead of using type conversion and '>>' operation. No functional change. Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 12:03:23 -07:00
Yunsheng Lin	647522a5ef	net: hns3: add struct netdev_queue debug info for TX timeout When there is a TX timeout, we can tell if the driver or stack has stopped the queue by looking at state field, and when has the last packet transmited by looking at trans_start field. So this patch prints these two field in the hns3_get_tx_timeo_queue_info(). Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 12:03:23 -07:00
Huazhong Tan	3d77d0cb05	net: hns3: dump some debug information when reset fail When reset fails, there is some information that will help for finding out why does reset fail. and removes an unused core_rst_cnt field in struct hclge_rst_stats. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 12:03:22 -07:00
David S. Miller	5a7ec66782	Just two fixes: * HT operation is not allowed on channel 14 (Japan only) * netlink policy for nexthop attribute was wrong -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl26uKkACgkQB8qZga/f l8S8lg//UB9ocCdFI2w5EMjpCpdOlrHsbOfyyeKdYtGZuy30YXIWg6AqU1sy3Cwr SMYoB1RWrrnnEAV1LirMXmYuK48gIAGvW3t/WHhTy0W6vhcEeOSGC5LKAsZAexR9 uG9BpcqkYhsx8IeXvw2tjhBG+poqZymmmCbujfaUA1+ZTcOk5C8vMWONp079Os6L RfkKRDwjxUG68j68aHEHeQX26MQGPXAhm0IQm7Or6KXn2ZKs18h0h5+Yp5RCUbTe Gqov2O1GGdcirz2rckLQv0kXblVzgRYOFnrvu+ePvcMeVVLH81Lq19RyccgDBC58 iUdDpnpmJQ4gFLCFOdFVYewsSJ+VNmHqE9Wo0dqj+kNLuN1C6KTaS81TNMY0eeVw M5GDVQLTxO0hiiIweQD/1PhB3CJqyxLbahZ2ikHTMcjQLy1CSHaGYWGV5dxkkRbk O0vH5LAb/VuTI0kZMIoCJAph76snWFpPFNTbrdO4DEY11wetVb0uUcq2LY7DoYXp cEysfyloWXNpUcpB21b6o5eKtqTvZZ1Ot1Q5Z5UpEEU1GVE05iAJeVLVHGy3fQCF zSymwtdc6OjPjYCh/lMHMq9tSfywHX0aseD1dBpe9KkLIyloa76xNnLhUrRTets9 DdVuu2Fi1HDIWkOUd2Gok2aAQrNTYI+zb2xPsnB1mN1swc4V5o4= =B0D8 -----END PGP SIGNATURE----- Merge tag 'mac80211-for-net-2019-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just two fixes: * HT operation is not allowed on channel 14 (Japan only) * netlink policy for nexthop attribute was wrong ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 11:43:36 -07:00
David S. Miller	7969774430	Merge branch 'bnxt_en-Add-OP-TEE-based-bnxt-f-w-manager' Sheetal Tigadoli says: ==================== bnxt_en: Add OP-TEE based bnxt f/w manager This patch series adds support for TEE based BNXT firmware management module and the driver changes to invoke OP-TEE APIs to fastboot firmware and to collect crash dump. Changes from v4: - update Kconfig to reflect dependency on TEE driver ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 11:00:45 -07:00
Vasundhara Volam	0b0eacf3c8	bnxt_en: Add support to collect crash dump via ethtool Driver supports 2 types of core dumps. 1. Live dump - Firmware dump when system is up and running. 2. Crash dump - Dump which is collected during firmware crash that can be retrieved after recovery. Crash dump is currently supported only on specific 58800 chips which can be retrieved using OP-TEE API only, as firmware cannot access this region directly. User needs to set the dump flag using following command before initiating the dump collection: $ ethtool -W\|--set-dump eth0 N Where N is "0" for live dump and "1" for crash dump Command to collect the dump after setting the flag: $ ethtool -w eth0 data Filename v3: Modify set_dump to support even when CONFIG_TEE_BNXT_FW=n. Also change log message to netdev_info(). Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Sheetal Tigadoli <sheetal.tigadoli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 11:00:45 -07:00
Vasundhara Volam	e07ab2021e	bnxt_en: Add support to invoke OP-TEE API to reset firmware In error recovery process when firmware indicates that it is completely down, initiate a firmware reset by calling OP-TEE API. Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Sheetal Tigadoli <sheetal.tigadoli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 11:00:45 -07:00
Vikas Gupta	246880958a	firmware: broadcom: add OP-TEE based BNXT f/w manager This driver registers on TEE bus to interact with OP-TEE based BNXT firmware management modules Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Sheetal Tigadoli <sheetal.tigadoli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 11:00:45 -07:00
David S. Miller	8c933eab2d	Merge branch 'mlxsw-Make-port-split-code-more-generic' Ido Schimmel says: ==================== mlxsw: Make port split code more generic Jiri says: Currently, we assume some limitations and constant values which are not applicable for Spectrum-3 which has 8 lanes ports (instead of previous 4 lanes). This patch does 2 things: 1) Generalizes the code to not use constants so it can work for 4, 8 and possibly 16 lanes. 2) Enforces some assumptions we had in the code but did not check. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	973b7fdb5f	mlxsw: spectrum: Generalize split count check Make the check generic for any possible value, not only 2 and 4. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	fbbeea3102	mlxsw: spectrum: Iterate over all ports in gap during unsplit create During recreation of original unsplit ports, just simply iterate over the whole gap and recreate whatever originally existed. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	c3a64b5173	mlxsw: spectrum: Fix base port get for split count 4 and 8 The current code considers only split by 2 or 4. Make the base port getting generic and allow split by 8 to be handled correctly. Generalize the used port checks as well. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	013da29791	mlxsw: spectrum: Use port_module_max_width to compute base port index Instead of using constant value, use port_module_max_width which is aligned with the cluster size. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	49185277cc	mlxsw: spectrum: Remember split base local port and use it in unsplit Don't compute the original base local port during unsplit, rather remember it in mlxsw_sp_port structure during split port creation. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	038784a9df	mlxsw: spectrum: Introduce resource for getting offset of 4 lanes split port In Spectrum-3 the modules have 8 lanes, so split by count 2 results in two split ports each of 4 lanes. Add a resource that can be used to obtain local port offset in that case. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	d0846ce9aa	mlxsw: spectrum: Push getting offsets of split ports into a helper Get local port offsets of split port in a separate helper function and use it in both split and unsplit function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	c8fc10dc17	mlxsw: spectrum: Add sanity checks into module info get Driver assumes certain values in the PMLP register. Add checks that verify that PMLP register provides fitting values. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	35896d9641	mlxsw: spectrum: Pass mapping values in port mapping structure Pass the port mapping structure down to create, module_map and other function instead of individual values. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	7b39fa5bef	mlxsw: spectrum: Use mapping of port being split for creating split ports Don't use constant max width value and instead of that, use the actual width of the port. Also don't pass module value and use the value stored in the same structure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	4a7f970f12	mlxsw: spectrum: Replace port_to_module array with array of structs Store the initial PMLP register configuration into array of structures instead of just simple array of module numbers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	26a6befa5d	mlxsw: spectrum: Distinguish between unsplittable and split port Currently when user does split, he is not able to distinguish if the port cannot be split because it is already split, or because it cannot be split at all. Add another check for split flag to distinguish this. Also add check forbidding split when maximal width is 1. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:47 -07:00
Jiri Pirko	2e6a2d7b45	mlxsw: spectrum: Move max_width check up before count check The fact that the port cannot be split further should be checked before checking the count, so move it. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:46 -07:00
Jiri Pirko	25911e1b97	mlxsw: spectrum: Use PMTM register to get max module width Currently the max module width is hard-coded according to ASIC type. That is not entirely correct, as the max module width might differ per-board. Use PMTM register to query FW for maximal width of a module. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:46 -07:00
Jiri Pirko	a513b1a591	mlxsw: reg: Add Port Module Type Mapping Register The PMTM allows query or configuration of module types. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:46 -07:00
Jiri Pirko	94e768373a	mlxsw: reg: Extend PMLP tx/rx lane value size to 4 bits The tx/rx lane fields got extended to 4 bits, update the reg field description accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:54:46 -07:00
Dan Carpenter	41591a51f0	iocost: don't nest spin_lock_irq in ioc_weight_write() This code causes a static analysis warning: block/blk-iocost.c:2113 ioc_weight_write() error: double lock 'irq' We disable IRQs in blkg_conf_prep() and re-enable them in blkg_conf_finish(). IRQ disable/enable should not be nested because that means the IRQs will be enabled at the first unlock instead of the second one. Fixes: `7caa47151a` ("blkcg: implement blk-iocost") Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-10-31 11:40:57 -06:00
Christophe JAILLET	d74361dc58	cxgb4/l2t: Simplify 't4_l2e_free()' and '_t4_l2e_free()' Use '__skb_queue_purge()' instead of re-implementing it. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 10:37:59 -07:00
Daniel Borkmann	0608711460	Merge branch 'bpf-cleanup-btf-raw-tp' Alexei Starovoitov says: ==================== v1->v2: addressed Andrii's feedback When BTF-enabled raw_tp were introduced the plan was to follow up with BTF-enabled kprobe and kretprobe reusing PROG_RAW_TRACEPOINT and PROG_KPROBE types. But k[ret]probe expect pt_regs while BTF-enabled program ctx will be the same as raw_tp. kretprobe is indistinguishable from kprobe while BTF-enabled kretprobe will have access to retval while kprobe will not. Hence PROG_KPROBE type is not reusable and reusing PROG_RAW_TRACEPOINT no longer fits well. Hence introduce 'umbrella' prog type BPF_PROG_TYPE_TRACING that will cover different BTF-enabled tracing attach points. The changes make libbpf side cleaner as well. check_attach_btf_id() is cleaner too. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-10-31 15:17:14 +01:00
Alexei Starovoitov	12a8654b2e	libbpf: Add support for prog_tracing Cleanup libbpf from expected_attach_type == attach_btf_id hack and introduce BPF_PROG_TYPE_TRACING. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191030223212.953010-3-ast@kernel.org	2019-10-31 15:16:59 +01:00
Alexei Starovoitov	f1b9509c2f	bpf: Replace prog_raw_tp+btf_id with prog_tracing The bpf program type raw_tp together with 'expected_attach_type' was the most appropriate api to indicate BTF-enabled raw_tp programs. But during development it became apparent that 'expected_attach_type' cannot be used and new 'attach_btf_id' field had to be introduced. Which means that the information is duplicated in two fields where one of them is ignored. Clean it up by introducing new program type where both 'expected_attach_type' and 'attach_btf_id' fields have specific meaning. In the future 'expected_attach_type' will be extended with other attach points that have similar semantics to raw_tp. This patch is replacing BTF-enabled BPF_PROG_TYPE_RAW_TRACEPOINT with prog_type = BPF_RPOG_TYPE_TRACING expected_attach_type = BPF_TRACE_RAW_TP attach_btf_id = btf_id of raw tracepoint inside the kernel Future patches will add expected_attach_type = BPF_TRACE_FENTRY or BPF_TRACE_FEXIT where programs have the same input context and the same helpers, but different attach points. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191030223212.953010-2-ast@kernel.org	2019-10-31 15:16:59 +01:00
Bjorn Andersson	36c602dcdd	arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo The Kryo cores share errata 1009 with Falkor, so add their model definitions and enable it for them as well. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> [will: Update entry in silicon-errata.rst] Signed-off-by: Will Deacon <will@kernel.org>	2019-10-31 13:22:12 +00:00
Paolo Bonzini	9167ab7993	KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active VMX already does so if the host has SMEP, in order to support the combination of CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in fact VMX already ends up running with EFER.NXE=1 on old processors that lack the "load EFER" controls, because it may help avoiding a slow MSR write. Removing all the conditionals simplifies the code. SVM does not have similar code, but it should since recent AMD processors do support SMEP. So this patch also makes the code for the two vendors more similar while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors. Cc: stable@vger.kernel.org Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-31 12:13:44 +01:00
Jim Mattson	a97b0e773e	kvm: call kvm_arch_destroy_vm if vm creation fails In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but then fail later in the function, we need to call kvm_arch_destroy_vm() so that it can do any necessary cleanup (like freeing memory). Fixes: `44a95dae1d` ("KVM: x86: Detect and Initialize AVIC support") Signed-off-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Junaid Shahid <junaids@google.com> [Remove dependency on "kvm: Don't clear reference count on kvm_create_vm() error path" which was not committed. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-31 12:13:16 +01:00
Javier Martinez Canillas	359efcc2c9	efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN The driver exposes EFI runtime services to user-space through an IOCTL interface, calling the EFI services function pointers directly without using the efivar API. Disallow access to the /dev/efi_test character device when the kernel is locked down to prevent arbitrary user-space to call EFI runtime services. Also require CAP_SYS_ADMIN to open the chardev to prevent unprivileged users to call the EFI runtime services, instead of just relying on the chardev file mode bits for this. The main user of this driver is the fwts [0] tool that already checks if the effective user ID is 0 and fails otherwise. So this change shouldn't cause any regression to this tool. [0]: https://wiki.ubuntu.com/FirmwareTestSuite/Reference/uefivarinfo Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Matthew Garrett <mjg59@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-7-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-31 09:40:21 +01:00
Kairui Song	220dd7699c	x86, efi: Never relocate kernel below lowest acceptable address Currently, kernel fails to boot on some HyperV VMs when using EFI. And it's a potential issue on all x86 platforms. It's caused by broken kernel relocation on EFI systems, when below three conditions are met: 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR) by the loader. 2. There isn't enough room to contain the kernel, starting from the default load address (eg. something else occupied part the region). 3. In the memmap provided by EFI firmware, there is a memory region starts below LOAD_PHYSICAL_ADDR, and suitable for containing the kernel. EFI stub will perform a kernel relocation when condition 1 is met. But due to condition 2, EFI stub can't relocate kernel to the preferred address, so it fallback to ask EFI firmware to alloc lowest usable memory region, got the low region mentioned in condition 3, and relocated kernel there. It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This is the lowest acceptable kernel relocation address. The first thing goes wrong is in arch/x86/boot/compressed/head_64.S. Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output address if kernel is located below it. Then the relocation before decompression, which move kernel to the end of the decompression buffer, will overwrite other memory region, as there is no enough memory there. To fix it, just don't let EFI stub relocate the kernel to any address lower than lowest acceptable address. [ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ] Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-31 09:40:19 +01:00
Ard Biesheuvel	41cd96fa14	efi: libstub/arm: Account for firmware reserved memory at the base of RAM The EFI stubloader for ARM starts out by allocating a 32 MB window at the base of RAM, in order to ensure that the decompressor (which blindly copies the uncompressed kernel into that window) does not overwrite other allocations that are made while running in the context of the EFI firmware. In some cases, (e.g., U-Boot running on the Raspberry Pi 2), this is causing boot failures because this initial allocation conflicts with a page of reserved memory at the base of RAM that contains the SMP spin tables and other pieces of firmware data and which was put there by the bootloader under the assumption that the TEXT_OFFSET window right below the kernel is only used partially during early boot, and will be left alone once the memory reservations are processed and taken into account. So let's permit reserved memory regions to exist in the region starting at the base of RAM, and ending at TEXT_OFFSET - 5 * PAGE_SIZE, which is the window below the kernel that is not touched by the early boot code. Tested-by: Guillaume Gardet <Guillaume.Gardet@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Chester Lin <clin@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-5-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-31 09:40:19 +01:00
Dominik Brodowski	18b915ac6b	efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness Commit `428826f535` ("fdt: add support for rng-seed") introduced add_bootloader_randomness(), permitting randomness provided by the bootloader or firmware to be credited as entropy. However, the fact that the UEFI support code was already wired into the RNG subsystem via a call to add_device_randomness() was overlooked, and so it was not converted at the same time. Note that this UEFI (v2.4 or newer) feature is currently only implemented for EFI stub booting on ARM, and further note that CONFIG_RANDOM_TRUST_BOOTLOADER must be enabled, and this should be done only if there indeed is sufficient trust in the bootloader _and_ its source of randomness. [ ardb: update commit log ] Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-4-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-31 09:40:18 +01:00
Jerry Snitselaar	2bb6a81633	efi/tpm: Return -EINVAL when determining tpm final events log size fails Currently nothing checks the return value of efi_tpm_eventlog_init(), but in case that changes in the future make sure an error is returned when it fails to determine the tpm final events log size. Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: `e658c82be5` ("efi/tpm: Only set 'efi_tpm_final_log_size' after ...") Link: https://lkml.kernel.org/r/20191029173755.27149-3-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-31 09:40:17 +01:00
Narendra K	0b6b30c656	efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only For the EFI_RCI2_TABLE Kconfig option, 'make oldconfig' asks the user for input on platforms where the option may not be applicable. This patch modifies the Kconfig option to ask the user for input only when CONFIG_X86 or CONFIG_COMPILE_TEST is set to y. Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Narendra K <Narendra.K@dell.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-2-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-31 09:40:16 +01:00
Linus Torvalds	e472c64aa4	dmaengine fixes for v5.4-rc6 Few fixes on the dmaengine drivers: - fix in sprd driver for link list and potential memory leak - tegra transfer failure fix - imx size check fix for script_number - xilinx fix for 64bit AXIDMA and control reg update - qcom bam dma resource leak fix - cppi slave transfer fix when idle -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl26hEgACgkQfBQHDyUj g0cAjRAAweFLgnBti+Cqb3mjbHxYfSknw/4CNis5XvhXfCGUupgqQQfL1mjH1ZGv GPz0lOjgkaUwtioxMrIqGSc/7MCsHOK02kktfY87SSraSQrKF2Hl9Y0KRYu8FIWW AFvmbhXAtLqRY87r6K/S7LzQzSrYjfGHxHV5ShsA18+2Y2gtuQQ2V6nbkjZquOle So27cjAZXYQJP9YI2DJXpcgwe2DSO2cgfcLH58t6Prw4/piJf7P1Qhs/ng5Egfbe RahP8AdTSjRf2ylVHngaM5EFuDsjn/uslEfh93dL4gQH/NWqNgTbk02R0HtwrRjs F8EVNsDqVjWrc1D/tKWUucYVMDdGHmNovN0sGESbuVBgUKwqjrhLMrEzxs2d0cgN h2YRbPnAHRmO/F0x789TGFsw3BU0T/gAl+LrVFUBCKvhwtEtsG6rmX1wjSVca3lM zMhV9oipMUBPSVQQhyIf/yizEqw0YkgFKelQppb6e/I+OBgQGYchgaiQtbNcI4L2 jOkQ7dsFPL8a8bmN+r38IzJMjHp5F+Govuc+1nmhuALpGxgBtmEqzgzBiP3AZgcJ XjkKW3IQdK+TAcqA6eDFLlm33Npe+flXQ/QvJlpYfHLDzZx2CNQExaFVo4meNwDu iGajKgP9DhO6l7uaep2mDxMPz7u/edhfvHy6cjWUIH9KjwPZ64w= =e87E -----END PGP SIGNATURE----- Merge tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: "A few fixes to the dmaengine drivers: - fix in sprd driver for link list and potential memory leak - tegra transfer failure fix - imx size check fix for script_number - xilinx fix for 64bit AXIDMA and control reg update - qcom bam dma resource leak fix - cppi slave transfer fix when idle" * tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle dmaengine: qcom: bam_dma: Fix resource leak dmaengine: sprd: Fix the possible memory leak issue dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config dmaengine: xilinx_dma: Fix 64-bit simple AXIDMA transfer dmaengine: imx-sdma: fix size check for sdma script_number dmaengine: tegra210-adma: fix transfer failure dmaengine: sprd: Fix the link-list pointer register configuration issue	2019-10-31 07:34:09 +00:00
David S. Miller	3da0966320	Merge branch 'hv_netvsc-fix-error-handling-in-netvsc_attach-set_features' Haiyang Zhang says: ==================== hv_netvsc: fix error handling in netvsc_attach/set_features The error handling code path in these functions are not correct. This patch set fixes them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-30 18:17:36 -07:00
Haiyang Zhang	719b85c336	hv_netvsc: Fix error handling in netvsc_attach() If rndis_filter_open() fails, we need to remove the rndis device created in earlier steps, before returning an error code. Otherwise, the retry of netvsc_attach() from its callers will fail and hang. Fixes: `7b2ee50c0c` ("hv_netvsc: common detach logic") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-30 18:17:36 -07:00
Haiyang Zhang	c4509a5ac0	hv_netvsc: Fix error handling in netvsc_set_features() When an error is returned by rndis_filter_set_offload_params(), we should still assign the unaffected features to ndev->features. Otherwise, these features will be missing. Fixes: `d6792a5a07` ("hv_netvsc: Add handler for LRO setting change") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-30 18:17:36 -07:00
Vishal Kulkarni	fc89cc358f	cxgb4: fix panic when attaching to ULD fail Release resources when attaching to ULD fail. Otherwise, data mismatch is seen between LLD and ULD later on, which lead to kernel panic when accessing resources that should not even exist in the first place. Fixes: `94cdb8bb99` ("cxgb4: Add support for dynamic allocation of resources for ULD") Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-30 18:11:13 -07:00
David S. Miller	d86784fe9b	Merge branch 'Control-action-percpu-counters-allocation-by-netlink-flag' Vlad Buslov says: ==================== Control action percpu counters allocation by netlink flag Currently, significant fraction of CPU time during TC filter allocation is spent in percpu allocator. Moreover, percpu allocator is protected with single global mutex which negates any potential to improve its performance by means of recent developments in TC filter update API that removed rtnl lock for some Qdiscs and classifiers. In order to significantly improve filter update rate and reduce memory usage we would like to allow users to skip percpu counters allocation for specific action if they don't expect high traffic rate hitting the action, which is a reasonable expectation for hardware-offloaded setup. In that case any potential gains to software fast-path performance gained by usage of percpu-allocated counters compared to regular integer counters protected by spinlock are not important, but amount of additional CPU and memory consumed by them is significant. In order to allow configuring action counters allocation type at runtime, implement following changes: - Implement helper functions to update the action counters and use them in affected actions instead of updating counters directly. This steps abstracts actions implementation from counter types that are being used for particular action instance at runtime. - Modify the new helpers to use percpu counters if they were allocated during action initialization and use regular counters otherwise. - Extend action UAPI TCA_ACT space with TCA_ACT_FLAGS field. Add TCA_ACT_FLAGS_NO_PERCPU_STATS action flag and update hardware-offloaded actions to not allocate percpu counters when the flag is set. With this changes users that prefer action update slow-path speed over software fast-path speed can dynamically request actions to skip percpu counters allocation without affecting other users. Now, lets look at actual performance gains provided by this change. Simple test is used to measure insertion rate - iproute2 TC is executed in parallel by xargs in batch mode, its total execution time is measured by shell builtin "time" command. The command runs 20 concurrent tc instances, each with its own batch file with 100k rules: $ time ls add* \| xargs -n 1 -P 20 sudo tc -b Two main rule profiles are tested. First is simple L2 flower classifier with single gact drop action. The configuration is chosen as worst case scenario because with single-action rules pressure on percpu allocator is minimized. Example rule: filter add dev ens1f0 protocol ip ingress prio 1 handle 1 flower skip_hw src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0 action drop Second profile is typical real-world scenario that uses flower classifier with some L2-4 fields and two actions (tunnel_key+mirred). Example rule: filter add dev ens1f0_0 protocol ip ingress prio 1 handle 1 flower skip_hw src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0 src_ip 192.168.111.1 dst_ip 192.168.111.2 ip_proto udp dst_port 1 src_port 1 action tunnel_key set id 1 src_ip 2.2.2.2 dst_ip 2.2.2.3 dst_port 4789 action mirred egress redirect dev vxlan1 Profile \| percpu \| no_percpu \| X improvement \| (k rules/sec) \| (k rules/sec) \| -------------------+---------------+---------------+--------------- Gact drop \| 203 \| 259 \| 1.28 tunnel_key+mirred \| 92 \| 204 \| 2.22 For simple drop action removing percpu allocation leads to ~25% insertion rate improvement. Perf profiles highlights the bottlenecks. Perf profile of run with percpu allocation (gact drop): + 89.11% 0.48% tc [kernel.vmlinux] [k] entry_SYSCALL_64 + 88.58% 0.04% tc [kernel.vmlinux] [k] do_syscall_64 + 87.50% 0.04% tc libc-2.29.so [.] __libc_sendmsg + 86.96% 0.04% tc [kernel.vmlinux] [k] __sys_sendmsg + 86.85% 0.01% tc [kernel.vmlinux] [k] ___sys_sendmsg + 86.60% 0.05% tc [kernel.vmlinux] [k] sock_sendmsg + 86.55% 0.12% tc [kernel.vmlinux] [k] netlink_sendmsg + 86.04% 0.13% tc [kernel.vmlinux] [k] netlink_unicast + 85.42% 0.03% tc [kernel.vmlinux] [k] netlink_rcv_skb + 84.68% 0.04% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg + 84.56% 0.24% tc [kernel.vmlinux] [k] tc_new_tfilter + 75.73% 0.65% tc [cls_flower] [k] fl_change + 71.30% 0.03% tc [kernel.vmlinux] [k] tcf_exts_validate + 71.27% 0.13% tc [kernel.vmlinux] [k] tcf_action_init + 71.06% 0.01% tc [kernel.vmlinux] [k] tcf_action_init_1 + 70.41% 0.04% tc [act_gact] [k] tcf_gact_init + 53.59% 1.21% tc [kernel.vmlinux] [k] __mutex_lock.isra.0 + 52.34% 0.34% tc [kernel.vmlinux] [k] tcf_idr_create - 51.23% 2.17% tc [kernel.vmlinux] [k] pcpu_alloc - 49.05% pcpu_alloc + 39.35% __mutex_lock.isra.0 4.99% memset_erms + 2.16% pcpu_alloc_area + 2.17% __libc_sendmsg + 45.89% 44.33% tc [kernel.vmlinux] [k] osq_lock + 9.94% 0.04% tc [kernel.vmlinux] [k] tcf_idr_check_alloc + 7.76% 0.00% tc [kernel.vmlinux] [k] tcf_idr_insert + 6.50% 0.03% tc [kernel.vmlinux] [k] tfilter_notify + 6.24% 6.11% tc [kernel.vmlinux] [k] mutex_spin_on_owner + 5.73% 5.32% tc [kernel.vmlinux] [k] memset_erms + 5.31% 0.18% tc [kernel.vmlinux] [k] tcf_fill_node Here bottleneck is clearly in pcpu_alloc() function that takes more than half CPU time, which is mostly wasted busy-waiting for internal percpu allocator global lock. With percpu allocation removed (gact drop): + 87.50% 0.51% tc [kernel.vmlinux] [k] entry_SYSCALL_64 + 86.94% 0.07% tc [kernel.vmlinux] [k] do_syscall_64 + 85.75% 0.04% tc libc-2.29.so [.] __libc_sendmsg + 85.00% 0.07% tc [kernel.vmlinux] [k] __sys_sendmsg + 84.84% 0.07% tc [kernel.vmlinux] [k] ___sys_sendmsg + 84.59% 0.01% tc [kernel.vmlinux] [k] sock_sendmsg + 84.58% 0.14% tc [kernel.vmlinux] [k] netlink_sendmsg + 83.95% 0.12% tc [kernel.vmlinux] [k] netlink_unicast + 83.34% 0.01% tc [kernel.vmlinux] [k] netlink_rcv_skb + 82.39% 0.12% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg + 82.16% 0.25% tc [kernel.vmlinux] [k] tc_new_tfilter + 75.13% 0.84% tc [cls_flower] [k] fl_change + 69.92% 0.05% tc [kernel.vmlinux] [k] tcf_exts_validate + 69.87% 0.11% tc [kernel.vmlinux] [k] tcf_action_init + 69.61% 0.02% tc [kernel.vmlinux] [k] tcf_action_init_1 - 68.80% 0.10% tc [act_gact] [k] tcf_gact_init - 68.70% tcf_gact_init + 36.08% tcf_idr_check_alloc + 31.88% tcf_idr_insert + 63.72% 0.58% tc [kernel.vmlinux] [k] __mutex_lock.isra.0 + 58.80% 56.68% tc [kernel.vmlinux] [k] osq_lock + 36.08% 0.04% tc [kernel.vmlinux] [k] tcf_idr_check_alloc + 31.88% 0.01% tc [kernel.vmlinux] [k] tcf_idr_insert The gact actions (like all other actions types) are inserted in single idr instance protected by global (per namespace) lock that becomes new bottleneck with such simple rule profile and prevents achieving 2x+ performance increase that can be expected by looking at profiling data for insertion action with percpu counter. Perf profile of run with percpu allocation (tunnel_key+mirred): + 91.95% 0.21% tc [kernel.vmlinux] [k] entry_SYSCALL_64 + 91.74% 0.06% tc [kernel.vmlinux] [k] do_syscall_64 + 90.74% 0.01% tc libc-2.29.so [.] __libc_sendmsg + 90.52% 0.01% tc [kernel.vmlinux] [k] __sys_sendmsg + 90.50% 0.04% tc [kernel.vmlinux] [k] ___sys_sendmsg + 90.41% 0.02% tc [kernel.vmlinux] [k] sock_sendmsg + 90.38% 0.04% tc [kernel.vmlinux] [k] netlink_sendmsg + 90.10% 0.06% tc [kernel.vmlinux] [k] netlink_unicast + 89.76% 0.01% tc [kernel.vmlinux] [k] netlink_rcv_skb + 89.28% 0.04% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg + 89.15% 0.03% tc [kernel.vmlinux] [k] tc_new_tfilter + 83.41% 0.33% tc [cls_flower] [k] fl_change + 81.17% 0.04% tc [kernel.vmlinux] [k] tcf_exts_validate + 81.13% 0.06% tc [kernel.vmlinux] [k] tcf_action_init + 81.04% 0.04% tc [kernel.vmlinux] [k] tcf_action_init_1 - 73.59% 2.16% tc [kernel.vmlinux] [k] pcpu_alloc - 71.42% pcpu_alloc + 61.41% __mutex_lock.isra.0 5.02% memset_erms + 2.93% pcpu_alloc_area + 2.16% __libc_sendmsg + 63.58% 0.17% tc [kernel.vmlinux] [k] tcf_idr_create + 63.40% 0.60% tc [kernel.vmlinux] [k] __mutex_lock.isra.0 + 57.85% 56.38% tc [kernel.vmlinux] [k] osq_lock + 46.27% 0.13% tc [act_tunnel_key] [k] tunnel_key_init + 34.26% 0.02% tc [act_mirred] [k] tcf_mirred_init + 10.99% 0.00% tc [kernel.vmlinux] [k] dst_cache_init + 5.32% 5.11% tc [kernel.vmlinux] [k] memset_erms With two times more actions pressure on percpu allocator doubles, so now it takes ~74% of CPU execution time. With percpu allocation removed (tunnel_key+mirred): + 86.02% 0.50% tc [kernel.vmlinux] [k] entry_SYSCALL_64 + 85.51% 0.12% tc [kernel.vmlinux] [k] do_syscall_64 + 84.40% 0.03% tc libc-2.29.so [.] __libc_sendmsg + 83.84% 0.03% tc [kernel.vmlinux] [k] __sys_sendmsg + 83.72% 0.01% tc [kernel.vmlinux] [k] ___sys_sendmsg + 83.56% 0.01% tc [kernel.vmlinux] [k] sock_sendmsg + 83.50% 0.08% tc [kernel.vmlinux] [k] netlink_sendmsg + 83.02% 0.17% tc [kernel.vmlinux] [k] netlink_unicast + 82.48% 0.00% tc [kernel.vmlinux] [k] netlink_rcv_skb + 81.89% 0.11% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg + 81.71% 0.25% tc [kernel.vmlinux] [k] tc_new_tfilter + 73.99% 0.63% tc [cls_flower] [k] fl_change + 69.72% 0.00% tc [kernel.vmlinux] [k] tcf_exts_validate + 69.72% 0.09% tc [kernel.vmlinux] [k] tcf_action_init + 69.53% 0.05% tc [kernel.vmlinux] [k] tcf_action_init_1 + 53.08% 0.91% tc [kernel.vmlinux] [k] __mutex_lock.isra.0 + 45.52% 43.99% tc [kernel.vmlinux] [k] osq_lock - 36.02% 0.21% tc [act_tunnel_key] [k] tunnel_key_init - 35.81% tunnel_key_init + 15.95% tcf_idr_check_alloc + 13.91% tcf_idr_insert - 4.70% dst_cache_init + 4.68% pcpu_alloc + 33.22% 0.04% tc [kernel.vmlinux] [k] tcf_idr_check_alloc + 32.34% 0.05% tc [act_mirred] [k] tcf_mirred_init + 28.24% 0.01% tc [kernel.vmlinux] [k] tcf_idr_insert + 7.79% 0.05% tc [kernel.vmlinux] [k] idr_alloc_u32 + 7.67% 7.35% tc [kernel.vmlinux] [k] idr_get_free + 6.46% 6.22% tc [kernel.vmlinux] [k] mutex_spin_on_owner + 5.11% 0.05% tc [kernel.vmlinux] [k] tfilter_notify With percpu allocation removed insertion rate is increased by ~120%. Such rule profile scales much better than simple single action because both types of actions were competing for single lock in percpu allocator, but not for action idr lock, which is per-action. Note that percpu allocator is still used by dst_cache in tunnel_key actions and consumes 4.68% CPU time. Dst_cache seems like good opportunity for further insertion rate optimization but is not addressed by this change. Another improvement provided by this change is significantly reduced memory usage. The test is implemented by sampling "used memory" value from "vmstat -s" command output. Following table includes memory usage measurements for same two configurations that were used for measuring insertion rate: Profile \| Mem per rule \| Mem per rule no_percpu \| Less memory used \| (KB) \| (KB) \| (KB) -------------------+--------------+------------------------+------------------ Gact drop \| 3.91 \| 2.51 \| 1.4 tunnel_key+mirred \| 6.73 \| 3.91 \| 2.8 Results indicate that memory usage of percpu allocator per action is ~1.4 KB. Note that any measurements of percpu allocator memory usage is inherently tied to particular setup since memory usage is linear to number of cores in system. It is to be expected that on current top of the line servers percpu allocator memory usage will be 2-5x more than on 24 CPUs setup that was used for testing. Setup details: 2x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 32GB memory Patches applied on top of net-next branch: commit `2203cbf2c8` (net-next) Author: Russell King <rmk+kernel@armlinux.org.uk> Date: Tue Oct 15 11:38:39 2019 +0100 net: sfp: move fwnode parsing into sfp-bus layer Changes V1 -> V2: - Include memory measurements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-30 18:07:51 -07:00
Vlad Buslov	9ae6b78708	tc-testing: implement tests for new fast_init action flag Add basic tests to verify action creation with new fast_init flag for all actions that support the flag. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-30 18:07:51 -07:00
Vlad Buslov	e382267860	net: sched: update action implementations to support flags Extend struct tc_action with new "tcfa_flags" field. Set the field in tcf_idr_create() function and provide new helper tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags value. Update individual hardware-offloaded actions init() to pass their "flags" argument to new helper in order to skip percpu stats allocation when user requested it through flags. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-30 18:07:51 -07:00
Vlad Buslov	abbb0d3363	net: sched: extend TCA_ACT space with TCA_ACT_FLAGS Extend TCA_ACT space with nla_bitfield32 flags. Add TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in tcf_action_init_1() and pass resulting value as additional argument to a_o->init(). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-30 18:07:50 -07:00

... 2 3 4 5 6 ...

873464 Commits