linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Linus Torvalds	f67e3fb489	device-dax for 5.1 * Replace the /sys/class/dax device model with /sys/bus/dax, and include a compat driver so distributions can opt-in to the new ABI. * Allow for an alternative driver for the device-dax address-range * Introduce the 'kmem' driver to hotplug / assign a device-dax address-range to the core-mm. * Arrange for the device-dax target-node to be onlined so that the newly added memory range can be uniquely referenced by numa apis. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJchWpGAAoJEB7SkWpmfYgCJk8P/0Q1DINszUDO/vKjJ09cDs9P Jw3it6GBIL50rDOu9QdcprSpwYDD0h1mLAV/m6oa3bVO+p4uWGvnxaxRx2HN2c/v vhZFtUDpHlqR63vzWMNVKRprYixCRJDUr6xQhhCcE3ak/ELN6w7LWfikKVWv15UL MfR96IQU38f+xRda/zSXnL9606Dvkvu/inEHj84lRcHIwj3sQAUalrE8bR3O32gZ bDg/l5kzT49o8ZXUo/TegvRSSSZpJmOl2DD0RW+ax5q3NI2bOXFrVDUKBKxf/hcQ E/V9i57TrqQx0GqRhnU7rN/v53cFZGGs31TEEIB/xs3bzCnADxwXcjL5b5K005J6 vJjBA2ODBewHFK3uVx46Hy1iV4eCtZWj4QrMnrjdSrjXOfbF5GTbWOhPFgoq7TWf S7VqFEf3I2gDPaMq4o8Ej1kLH4HMYeor2NSOZjyvGn87rSZ3ZIQguwbaNIVl+itz gdDt0ZOU0BgOBkV+rZIeZDaGdloWCHcDPL15CkZaOZyzdWhfEZ7dod6ad+9udilU EUPH62RgzXZtfm5zpebYyjNVLbb9pLZ0nT+UypyGR6zqWx1SqU3mXi63NFXPco+x XA9j//edPeI6NHg2CXLEh8DLuCg3dG1zWRJANkiF+niBwyCR8CHtGWAoY6soXbKe 2UrXGcIfXxyJ8V9v8v4q =hfa3 -----END PGP SIGNATURE----- Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull device-dax updates from Dan Williams: "New device-dax infrastructure to allow persistent memory and other "reserved" / performance differentiated memories, to be assigned to the core-mm as "System RAM". Some users want to use persistent memory as additional volatile memory. They are willing to cope with potential performance differences, for example between DRAM and 3D Xpoint, and want to use typical Linux memory management apis rather than a userspace memory allocator layered over an mmap() of a dax file. The administration model is to decide how much Persistent Memory (pmem) to use as System RAM, create a device-dax-mode namespace of that size, and then assign it to the core-mm. The rationale for device-dax is that it is a generic memory-mapping driver that can be layered over any "special purpose" memory, not just pmem. On subsequent boots udev rules can be used to restore the memory assignment. One implication of using pmem as RAM is that mlock() no longer keeps data off persistent media. For this reason it is recommended to enable NVDIMM Security (previously merged for 5.0) to encrypt pmem contents at rest. We considered making this recommendation an actively enforced requirement, but in the end decided to leave it as a distribution / administrator policy to allow for emulation and test environments that lack security capable NVDIMMs. Summary: - Replace the /sys/class/dax device model with /sys/bus/dax, and include a compat driver so distributions can opt-in to the new ABI. - Allow for an alternative driver for the device-dax address-range - Introduce the 'kmem' driver to hotplug / assign a device-dax address-range to the core-mm. - Arrange for the device-dax target-node to be onlined so that the newly added memory range can be uniquely referenced by numa apis" NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because we currently have special - and very annoying rules in the kernel about accessing PMEM only with the "MC safe" accessors, because machine checks inside the regular repeat string copy functions can be fatal in some (not described) circumstances. And apparently the PMEM modules can cause that a lot more than regular RAM. The argument is that this happens because PMEM doesn't necessarily get scrubbed at boot like RAM does, but that is planned to be added for the user space tooling. Quoting Dan from another email: "The exposure can be reduced in the volatile-RAM case by scanning for and clearing errors before it is onlined as RAM. The userspace tooling for that can be in place before v5.1-final. There's also runtime notifications of errors via acpi_nfit_uc_error_notify() from background scrubbers on the DIMM devices. With that mechanism the kernel could proactively clear newly discovered poison in the volatile case, but that would be additional development more suitable for v5.2. I understand the concern, and the need to highlight this issue by tapping the brakes on feature development, but I don't see PMEM as RAM making the situation worse when the exposure is also there via DAX in the PMEM case. Volatile-RAM is arguably a safer use case since it's possible to repair pages where the persistent case needs active application coordination" * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: "Hotplug" persistent memory for use like normal RAM mm/resource: Let walk_system_ram_range() search child resources mm/memory-hotplug: Allow memory resources to be children mm/resource: Move HMM pr_debug() deeper into resource code mm/resource: Return real error codes from walk failures device-dax: Add a 'modalias' attribute to DAX 'bus' devices device-dax: Add a 'target_node' attribute device-dax: Auto-bind device after successful new_id acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node device-dax: Add /sys/class/dax backwards compatibility device-dax: Add support for a dax override driver device-dax: Move resource pinning+mapping into the common driver device-dax: Introduce bus + driver model device-dax: Start defining a dax bus model device-dax: Remove multi-resource infrastructure device-dax: Kill dax_region base device-dax: Kill dax_region ida	2019-03-16 13:05:32 -07:00
Dan Williams	4083014e32	Merge branch 'for-5.1/nfit/ars' into libnvdimm-for-next Merge several updates to the ARS implementation. Highlights include: * Support retrieval of short-ARS results if the ARS state is "requires continuation", and even if the "no_init_ars" module parameter is specified. * Allow busy-polling of the kernel ARS state by allowing root to reset the exponential back-off timer. * Filter potentially stale ARS results by tracking query-ARS relative to the previous start-ARS.	2019-03-11 12:37:55 -07:00
Dan Williams	451fed24e9	Merge branch 'for-5.1/libnvdimm' into libnvdimm-for-next Merge miscellaneous libnvdimm sub-system updates for v5.1. Highlights include: * Support for the Hyper-V family of device-specific-methods (DSMs) * Several fixes and workarounds for Hyper-V compatibility. * Fix for the support to cache the dirty-shutdown-count at init.	2019-03-11 12:13:42 -07:00
Toshi Kani	5c9d62d002	acpi/nfit: Update NFIT flags error message ACPI NFIT flags field reports major errors on NVDIMM, which need user's attention. Update the current log to a proper error message with dev_err(). The current message string is kept for grep-compatibility. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Robert Elliott <elliott@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-03-01 09:44:59 -08:00
Dan Williams	78153dd45e	nfit/ars: Avoid stale ARS results Gate ARS result consumption on whether the OS issued start-ARS since the previous consumption. The BIOS may only clear its result buffers after a successful start-ARS. Fixes: `0caeef63e6` ("libnvdimm: Add a poison list and export badblocks") Cc: <stable@vger.kernel.org> Reported-by: Krzysztof Rusocki <krzysztof.rusocki@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-02-20 14:18:59 -08:00
Dan Williams	5479b2757f	nfit/ars: Allow root to busy-poll the ARS state machine The ARS implementation implements exponential back-off on the poll interval to prevent high-frequency access to the DIMM / platform interface. Depending on when the ARS completes the poll interval may exceed the completion event by minutes. Allow root to reset the timeout each time it probes the status. A one-second timeout is still enforced, but root can otherwise can control the poll interval. Fixes: `bc6ba80858` ("nfit, address-range-scrub: rework and simplify ARS...") Cc: <stable@vger.kernel.org> Reported-by: Erwin Tsaur <erwin.tsaur@oracle.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-02-20 14:18:59 -08:00
Dan Williams	e34b8252a3	nfit/ars: Introduce scrub_flags In preparation for introducing new flags to gate whether ARS results are stale, or poll the completion state, convert the existing flags to an unsigned long with enumerated values. This conversion allows the flags to be atomically updated outside of ->init_mutex. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-02-20 14:18:59 -08:00
Dan Williams	317a992ab9	nfit/ars: Remove ars_start_flags The ars_start_flags property of 'struct acpi_nfit_desc' is no longer used since ARS_REQ_SHORT and ARS_REQ_LONG were added. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-02-20 14:18:59 -08:00
Dan Williams	fa3ed4d981	nfit/ars: Attempt short-ARS even in the no_init_ars case The no_init_ars option is meant to prevent long-ARS, but short-ARS should be allowed to grab any immediate results. Fixes: `bc6ba80858` ("nfit, address-range-scrub: rework and simplify ARS...") Cc: <stable@vger.kernel.org> Reported-by: Erwin Tsaur <erwin.tsaur@oracle.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-02-20 14:18:59 -08:00
Dan Williams	c6c5df293b	nfit/ars: Attempt a short-ARS whenever the ARS state is idle at boot If query-ARS reports that ARS has stopped and requires continuation attempt to retrieve short-ARS results before continuing the long operation. Fixes: `bc6ba80858` ("nfit, address-range-scrub: rework and simplify ARS...") Cc: <stable@vger.kernel.org> Reported-by: Krzysztof Rusocki <krzysztof.rusocki@intel.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-02-13 08:58:40 -08:00
Dan Williams	0171b6b781	acpi/nfit: Require opt-in for read-only label configurations Recent fixes to command handling enabled Linux to read label configurations that it could not before. Unfortunately that means that configurations that were operating in label-less mode will be broken as the kernel ignores the existing namespace configuration and tries to honor the new found labels. Fortunately this seems limited to a case where Linux can quirk the behavior and maintain the existing label-less semantics by default. When the platform does not emit an _LSW method, disable all label access methods. Provide a 'force_labels' module parameter to allow read-only label operation. Fixes: `11189c1089` ("acpi/nfit: Fix command-supported detection") Reported-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-02-12 20:14:15 -08:00
Dan Williams	ebe9f6f19d	acpi/nfit: Fix bus command validation Commit `11189c1089` "acpi/nfit: Fix command-supported detection" broke ND_CMD_CALL for bus-level commands. The "func = cmd" assumption is only valid for: ND_CMD_ARS_CAP ND_CMD_ARS_START ND_CMD_ARS_STATUS ND_CMD_CLEAR_ERROR The function number otherwise needs to be pulled from the command payload for: NFIT_CMD_TRANSLATE_SPA NFIT_CMD_ARS_INJECT_SET NFIT_CMD_ARS_INJECT_CLEAR NFIT_CMD_ARS_INJECT_GET Update cmd_to_func() for the bus case and call it in the common path. Fixes: `11189c1089` ("acpi/nfit: Fix command-supported detection") Cc: <stable@vger.kernel.org> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Grzegorz Burzynski <grzegorz.burzynski@intel.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-02-07 14:56:50 -08:00
Dan Williams	d5d30d5a5c	libnvdimm/dimm: Add a no-BLK quirk based on NVDIMM family As Dexuan reports the NVDIMM_FAMILY_HYPERV platform is incompatible with the existing Linux namespace implementation because it uses NSLABEL_FLAG_LOCAL for x1-width PMEM interleave sets. Quirk it as an platform / DIMM that does not provide BLK-aperture access. Allow the libnvdimm core to assume no potential for aliasing. In case other implementations make the same mistake, provide a "noblk" module parameter to force-enable the quirk. Link: https://lkml.kernel.org/r/PU1P153MB0169977604493B82B662A01CBF920@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM Reported-by: Dexuan Cui <decui@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-02-02 16:35:26 -08:00
Dexuan Cui	1194c41331	nfit: Add Hyper-V NVDIMM DSM command set to white list Add the Hyper-V _DSM command set to the white list of NVDIMM command sets. This command set is documented at http://www.uefi.org/RFIC_LIST (see "Virtual NVDIMM 0x1901"). Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-29 22:09:31 -08:00
Dexuan Cui	43f89877f2	nfit: acpi_nfit_ctl(): Check out_obj->type in the right place In the case of ND_CMD_CALL, we should also check out_obj->type. The patch uses out_obj->type, which is a short alias to out_obj->package.type. Fixes: `31eca76ba2` ("nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism") Cc: <stable@vger.kernel.org> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-29 22:08:34 -08:00
Dan Williams	f596c8844f	nfit: Fix nfit_intel_shutdown_status() command submission The implementation is broken in all the ways the unit test did not touch: 1/ The local definition of in_buf and in_obj violated C99 initializer expectations for zeroing. By only initializing 2 out of the three struct members the compiler was free to zero-initialize the remaining entry even though the aliased location in the union was initialized. 2/ The implementation made assumptions about the state of the 'smart' payload after command execution that are satisfied by acpi_nfit_ctl(), but not acpi_evaluate_dsm(). 3/ populate_shutdown_status() is skipped on Intel NVDIMMs due to the early return for skipping the common _LS{I,R,W} enabling. 4/ The input length should be zero. This breakage was missed due to the unit test implementation only testing the case where nfit_intel_shutdown_status() returns a valid payload. Much of this complexity would be saved if acpi_nfit_ctl() could be used, but that currently requires a 'struct nvdimm *' argument and one is not created until later in the init process. The health result is needed before the device is created because the payload gates whether the nmemX/nfit/dirty_shutdown property is visible in sysfs. Cc: <stable@vger.kernel.org> Fixes: `0ead11181f` ("acpi, nfit: Collect shutdown status") Reported-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-29 22:08:34 -08:00
Dan Williams	11189c1089	acpi/nfit: Fix command-supported detection The _DSM function number validation only happens to succeed when the generic Linux command number translation corresponds with a DSM-family-specific function number. This breaks NVDIMM-N implementations that correctly implement _LSR, _LSW, and _LSI, but do not happen to publish support for DSM function numbers 4, 5, and 6. Recall that the support for _LS{I,R,W} family of methods results in the DIMM being marked as supporting those command numbers at acpi_nfit_register_dimms() time. The DSM function mask is only used for ND_CMD_CALL support of non-NVDIMM_FAMILY_INTEL devices. Fixes: `31eca76ba2` ("nfit, libnvdimm: limited/whitelisted dimm command...") Cc: <stable@vger.kernel.org> Link: https://github.com/pmem/ndctl/issues/78 Reported-by: Sujith Pandel <sujith_pandel@dell.com> Tested-by: Sujith Pandel <sujith_pandel@dell.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-21 09:58:31 -08:00
Dan Williams	5e9e38d0db	acpi/nfit: Block function zero DSMs In preparation for using function number 0 as an error value, prevent it from being considered a valid function value by acpi_nfit_ctl(). Cc: <stable@vger.kernel.org> Cc: stuart hayes <stuart.w.hayes@gmail.com> Fixes: `e02fb7264d` ("nfit: add Microsoft NVDIMM DSM command set...") Reported-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-21 09:58:29 -08:00
Dan Williams	1cd7386549	libnvdimm/security: Require nvdimm_security_setup_events() to succeed The following warning: ACPI0012:00: security event setup failed: -19 ...is meant to capture exceptional failures of sysfs_get_dirent(), however it will also fail in the common case when security support is disabled. A few issues: 1/ A dev_warn() report for a common case is too chatty 2/ The setup of this notifier is generic, no need for it to be driven from the nfit driver, it can exist completely in the core. 3/ If it fails for any reason besides security support being disabled, that's fatal and should abort DIMM activation. Userspace may hang if it never gets overwrite notifications. 4/ The dirent needs to be released. Move the call to the core 'dimm' driver, make it conditional on security support being active, make it fatal for the exceptional case, add the missing sysfs_put() at device disable time. Fixes: `7d988097c5` ("...Add security DSM overwrite support") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-21 09:57:43 -08:00
Wei Yang	b4fe30e45a	acpi/nfit: Remove duplicate set nd_set in acpi_nfit_init_interleave_set() We allocate nd_set in acpi_nfit_init_interleave_set() and assignn it to ndr_desc, while the assignment is done twice in this function. This patch removes the first assignment. No functional change. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-14 18:59:08 -08:00
Tony Luck	0919871ac3	acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id() Possible race accessing memdev structures after dropping the mutex. Dan Williams says this could race against another thread that is doing: # echo "ACPI0012:00" > /sys/bus/acpi/drivers/nfit/unbind Reported-by: Jane Chu <jane.chu@oracle.com> Fixes: `23222f8f8d` ("acpi, nfit: Add function to look up nvdimm...") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-11 14:50:29 -08:00
Xiaochun Lee	8a7f02f67c	ACPI/nfit: delete the function to_acpi_nfit_desc The function to_acpi_nfit_desc and function to_acpi_desc do the same things,delete the function to_acpi_nfit_desc, and keep the inline function to_acpi_desc. Signed-off-by: Xiaochun Lee <lixc17@lenovo.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-08 17:28:46 -08:00
Xiaochun Lee	dadbcb450c	ACPI/nfit: delete the redundant header file The header file "intel.h" is repeated here, So delete one. Signed-off-by: Xiaochun Lee <lixc17@lenovo.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-08 17:28:45 -08:00
Dan Williams	8fc5c73554	acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware Interface Table), is the first known instance of a memory range described by a unique "target" proximity domain. Where "initiator" and "target" proximity domains is an approach that the ACPI HMAT (Heterogeneous Memory Attributes Table) uses to described the unique performance properties of a memory range relative to a given initiator (e.g. CPU or DMA device). Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y char-device follows the traditional notion of 'numa-node' where the attribute conveys the closest online numa-node. That numa-node attribute is useful for cpu-binding and memory-binding processes near the device. However, when the memory range backing a 'pmem', or 'dax' device is onlined (memory hot-add) the memory-only-numa-node representing that address needs to be differentiated from the set of online nodes. In other words, the numa-node association of the device depends on whether you can bind processes near the cpu-numa-node in the offline device-case, or bind process on the memory-range directly after the backing address range is onlined. Allow for the case that platform firmware describes persistent memory with a unique proximity domain, i.e. when it is distinct from the proximity of DRAM and CPUs that are on the same socket. Plumb the Linux numa-node translation of that proximity through the libnvdimm region device to namespaces that are in device-dax mode. With this in place the proposed kmem driver [1] can optionally discover a unique numa-node number for the address range as it transitions the memory from an offline state managed by a device-driver to an online memory range managed by the core-mm. [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com Reported-by: Fan Du <fan.du@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2019-01-06 21:41:57 -08:00
Dan Williams	4b5f747e82	Merge miscellaneous libnvdimm updates for 4.21 * Use common helpers, bitmap_zalloc() and kstrndup(), to replace open coded versions. * Clarify the comments around hotplug vs initial init case for the nfit driver. * Cleanup the libnvdimm init path.	2018-12-27 19:54:10 -08:00
Dave Jiang	89fa9d8ea7	acpi/nfit, libnvdimm/security: add Intel DSM 1.8 master passphrase support With Intel DSM 1.8 [1] two new security DSMs are introduced. Enable/update master passphrase and master secure erase. The master passphrase allows a secure erase to be performed without the user passphrase that is set on the NVDIMM. The commands of master_update and master_erase are added to the sysfs knob in order to initiate the DSMs. They are similar in opeartion mechanism compare to update and erase. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-12-21 12:44:41 -08:00
Dave Jiang	7d988097c5	acpi/nfit, libnvdimm/security: Add security DSM overwrite support Add support for the NVDIMM_FAMILY_INTEL "ovewrite" capability as described by the Intel DSM spec v1.7. This will allow triggering of overwrite on Intel NVDIMMs. The overwrite operation can take tens of minutes. When the overwrite DSM is issued successfully, the NVDIMMs will be unaccessible. The kernel will do backoff polling to detect when the overwrite process is completed. According to the DSM spec v1.7, the 128G NVDIMMs can take up to 15mins to perform overwrite and larger DIMMs will take longer. Given that overwrite puts the DIMM in an indeterminate state until it completes introduce the NDD_SECURITY_OVERWRITE flag to prevent other operations from executing when overwrite is happening. The NDD_WORK_PENDING flag is added to denote that there is a device reference on the nvdimm device for an async workqueue thread context. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-12-21 12:44:41 -08:00
Dave Jiang	f298939655	acpi/nfit, libnvdimm: Introduce nvdimm_security_ops Some NVDIMMs, like the ones defined by the NVDIMM_FAMILY_INTEL command set, expose a security capability to lock the DIMMs at poweroff and require a passphrase to unlock them. The security model is derived from ATA security. In anticipation of other DIMMs implementing a similar scheme, and to abstract the core security implementation away from the device-specific details, introduce nvdimm_security_ops. Initially only a status retrieval operation, ->state(), is defined, along with the base infrastructure and definitions for future operations. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-12-13 17:54:13 -08:00
Dave Jiang	d6548ae4d1	acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimm The generated dimm id is needed for the sysfs attribute as well as being used as the identifier/description for the security key. Since it's constant and should never change, store it as a member of struct nvdimm. As nvdimm_create() continues to grow parameters relative to NFIT driver requirements, do not require other implementations to keep pace. Introduce __nvdimm_create() to carry the new parameters and keep nvdimm_create() with the long standing default api. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-12-13 17:54:12 -08:00
Ocean He	9f619d4769	ACPI/nfit: Adjust annotation for why return 0 if fail to find NFIT at start Add detailed explanation for why it's ok to return 0 if we fail to find an NFIT at startup. Refer to chapter 9.20.2 NVDIMM Root Device in ACPI 6.2 spec. Signed-off-by: Ocean He <hehy1@lenovo.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-12-10 15:41:50 -08:00
Dan Williams	b5fd2e00a6	acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short" A "short" ARS (address range scrub) instructs the platform firmware to return known errors. In contrast, a "long" ARS instructs platform firmware to arrange every data address on the DIMM to be read / checked for poisoned data. The conversion of the flags in commit `d3abaf43ba` "acpi, nfit: Fix Address Range Scrub completion tracking", changed the meaning of passing '0' to acpi_nfit_ars_rescan(). Previously '0' meant "not short", now '0' is ARS_REQ_SHORT. Pass ARS_REQ_LONG to restore the expected scrub-type behavior of user-initiated ARS sessions. Fixes: `d3abaf43ba` ("acpi, nfit: Fix Address Range Scrub completion tracking") Reported-by: Jacek Zloch <jacek.zloch@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-12-05 14:16:13 -08:00
Dave Jiang	b3ed2ce024	acpi/nfit: Add support for Intel DSM 1.8 commands Add command definition for security commands defined in Intel DSM specification v1.8 [1]. This includes "get security state", "set passphrase", "unlock unit", "freeze lock", "secure erase", "overwrite", "overwrite query", "master passphrase enable/disable", and "master erase", . Since this adds several Intel definitions, move the relevant bits to their own header. These commands mutate physical data, but that manipulation is not cache coherent. The requirement to flush and invalidate caches makes these commands unsuitable to be called from userspace, so extra logic is added to detect and block these commands from being submitted via the ioctl command submission path. Lastly, the commands may contain sensitive key material that should not be dumped in a standard debug session. Update the nvdimm-command payload-dump facility to move security command payloads behind a default-off compile time switch. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-12-04 10:31:11 -08:00
Dan Williams	2121db0963	Revert "acpi, nfit: Further restrict userspace ARS start requests" The following lockdep splat results from acquiring the init_mutex in acpi_nfit_clear_to_send(): WARNING: possible circular locking dependency detected lt-daxdev-error/7216 is trying to acquire lock: 00000000f694db15 (&acpi_desc->init_mutex){+.+.}, at: acpi_nfit_clear_to_send+0x27/0x80 [nfit] but task is already holding lock: 00000000182298f2 (&nvdimm_bus->reconfig_mutex){+.+.}, at: __nd_ioctl+0x457/0x610 [libnvdimm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&nvdimm_bus->reconfig_mutex){+.+.}: nvdimm_badblocks_populate+0x41/0x150 [libnvdimm] nd_region_notify+0x95/0xb0 [libnvdimm] nd_device_notify+0x40/0x50 [libnvdimm] ars_complete+0x7f/0xd0 [nfit] acpi_nfit_scrub+0xbb/0x410 [nfit] process_one_work+0x22b/0x5c0 worker_thread+0x3c/0x390 kthread+0x11e/0x140 ret_from_fork+0x3a/0x50 -> #0 (&acpi_desc->init_mutex){+.+.}: __mutex_lock+0x83/0x980 acpi_nfit_clear_to_send+0x27/0x80 [nfit] __nd_ioctl+0x474/0x610 [libnvdimm] nd_ioctl+0xa4/0xb0 [libnvdimm] do_vfs_ioctl+0xa5/0x6e0 ksys_ioctl+0x70/0x80 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe New infrastructure is needed to be able to perform this check without acquiring the lock. Fixes: `594861215c` ("acpi, nfit: Further restrict userspace ARS start") Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-11-10 09:54:28 -08:00
Dan Williams	3fa58dcab5	acpi, nfit: Fix ARS overflow continuation When the platform BIOS is unable to report all the media error records it requires the OS to restart the scrub at a prescribed location. The driver detects the overflow condition, but then fails to report it to the ARS state machine after reaping the records. Propagate -ENOSPC correctly to continue the ARS operation. Cc: <stable@vger.kernel.org> Fixes: `1cf03c00e7` ("nfit: scrub and register regions in a workqueue") Reported-by: Jacek Zloch <jacek.zloch@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-11-10 09:54:28 -08:00
Dan Williams	594861215c	acpi, nfit: Further restrict userspace ARS start requests In addition to not allowing ARS start while the background thread is actively running, prevent ARS start while any scrub request is pending. This aligns the window for ARS start submission with the status of ARS reported via sysfs. Previously userspace could sneak its own ARS start requests in while sysfs reported -EBUSY. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-17 14:02:32 -07:00
Dan Williams	d3abaf43ba	acpi, nfit: Fix Address Range Scrub completion tracking The Address Range Scrub implementation tried to skip running scrubs against ranges that were already scrubbed by the BIOS. Unfortunately that support also resulted in early scrub completions as evidenced by this debug output from nfit_test: nd_region region9: ARS: range 1 short complete nd_region region3: ARS: range 1 short complete nd_region region4: ARS: range 2 ARS start (0) nd_region region4: ARS: range 2 short complete ...i.e. completions without any indications that the scrub was started. This state of affairs was hard to see in the code due to the proliferation of state bits and mistakenly trying to track done state per-range when the completion is a global property of the bus. So, kill the four ARS state bits (ARS_REQ, ARS_REQ_REDO, ARS_DONE, and ARS_SHORT), and replace them with just 2 request flags ARS_REQ_SHORT and ARS_REQ_LONG. The implementation will still complete and reap the results of BIOS initiated ARS, but it will not attempt to use that information to affect the completion status of scrubbing the ranges from a Linux perspective. Instead, try to synchronously run a short ARS per range at init time and schedule a long scrub in the background. If ARS is busy with an ARS request, schedule both a short and a long scrub for when ARS returns to idle. This logic also satisfies the intent of what ARS_REQ_REDO was trying to achieve. The new rule is that the REQ flag stays set until the next successful ars_start() for that range. With the new policy that the REQ flags are not cleared until the next start, the implementation no longer loses requests as can be seen from the following log: nd_region region3: ARS: range 1 ARS start short (0) nd_region region9: ARS: range 1 ARS start short (0) nd_region region3: ARS: range 1 complete nd_region region4: ARS: range 2 ARS start short (0) nd_region region9: ARS: range 1 complete nd_region region9: ARS: range 1 ARS start long (0) nd_region region4: ARS: range 2 complete nd_region region3: ARS: range 1 ARS start long (0) nd_region region9: ARS: range 1 complete nd_region region3: ARS: range 1 complete nd_region region4: ARS: range 2 ARS start long (0) nd_region region4: ARS: range 2 complete ...note that the nfit_test emulated driver provides 2 buses, that is why some of the range indices are duplicated. Notice that each range now successfully completes a short and long scrub. Cc: <stable@vger.kernel.org> Fixes: `14c73f997a` ("nfit, address-range-scrub: introduce nfit_spa->ars_state") Fixes: `cc3d3458d4` ("acpi/nfit: queue issuing of ars when an uc error...") Reported-by: Jacek Zloch <jacek.zloch@intel.com> Reported-by: Krzysztof Rusocki <krzysztof.rusocki@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-17 13:57:51 -07:00
Dan Williams	f110176633	tools/testing/nvdimm: Populate dirty shutdown data Allow the unit tests to verify the retrieval of the dirty shutdown count via smart commands, and allow the driver-load-time retrieval of the smart health payload to be simulated by nfit_test. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-17 10:47:19 -07:00
Dan Williams	0ead11181f	acpi, nfit: Collect shutdown status Some NVDIMMs, in addition to providing an indication of whether the previous shutdown was clean, also provide a running count of lifetime dirty-shutdown events for the device. In anticipation of this functionality appearing on more devices arrange for the nfit driver to retrieve / cache this data at DIMM discovery time, and export it via sysfs. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-17 10:39:04 -07:00
Dan Williams	6f07f86c49	acpi, nfit: Introduce nfit_mem flags In preparation for adding a flag to indicate whether a DIMM publishes a dirty-shutdown count, convert the existing flags to a bit field. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-16 17:57:58 -07:00
Linus Torvalds	828bf6e904	libnvdimm-for-4.19_misc Collection of misc libnvdimm patches for 4.19 submission * Adding support to read locked nvdimm capacity. * Change test code to make DSM failure code injection an override. * Add support for calculate maximum contiguous area for namespace. * Add support for queueing a short ARS when there is on going ARS for nvdimm. * Allow NULL to be passed in to ->direct_access() for kaddr and pfn params. * Improve smart injection support for nvdimm emulation testing. * Fix test code that supports for emulating controller temperature. * Fix hang on error before devm_memremap_pages() * Fix a bug that causes user memory corruption when data returned to user for ars_status. * Maintainer updates for Ross Zwisler emails and adding Jan Kara to fsdax. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAlt9uUIACgkQYGjFFmlT OErL+xAAgWHSGs8w98VtYA9kLDeTYEXutq93wJZQoBu/FMAXuuU3hYmQYnOQU87h KKYmfDkeusaih1R3IX7mzlegnnzSfQ6MraNSV76M43noJHbRTunknCPZH6ebp4fo b/eljvWlZF/idM+7YcsnoFMnHSRj2pjJGXmKQDlKedHD+KMxpmk6zEl2s5Y0zvPU 4U7UQLtk3D5IIpLNsLEmxge32BfvNf5IzoSO1aZp7Eqk0+U5Tq3Sq/Tjmd+J0RKt 6WH5yA6NqXQgBh+ayHsYU8YX62RqnbKQZXqVxD35OH64zJEUefnP1fpt9pmaZ9eL 43BPMkpM09eLAikO2ET3/3c2k6h3h9ttz1sH8t/hiroCtfmxs3XgskY06hxpKjZV EbN+BUmut5Mr+zzYitRr3dbK2aHPVU9IbU7jUw/1Tz23rq3kU5iI7SHHv1b/eWup 1Cr77Z1M6HB8VBhjnJ+R607sbRrnKQUOV7fGzAaIskyUOTWsEvIgTh/6MRiaj9MD 5HXIgc/0y9E+G93s7MsUWwzpB7J6E7EGoybST2SKPtqwtDMPsBNeWRjyA9quBCoN u1s+e+lWHYutqRW0eisDTTlq3nJwPijSx1nnzhJxw9s1EkCXz3f7KRZhyH1C79Co 7wjiuvKQ79e/HI/oXvGmTnv5lbLEpWYyJ3U3KIFfoUqugeyhr0k= =5p2n -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dave Jiang: "Collection of misc libnvdimm patches for 4.19 submission: - Adding support to read locked nvdimm capacity. - Change test code to make DSM failure code injection an override. - Add support for calculate maximum contiguous area for namespace. - Add support for queueing a short ARS when there is on going ARS for nvdimm. - Allow NULL to be passed in to ->direct_access() for kaddr and pfn params. - Improve smart injection support for nvdimm emulation testing. - Fix test code that supports for emulating controller temperature. - Fix hang on error before devm_memremap_pages() - Fix a bug that causes user memory corruption when data returned to user for ars_status. - Maintainer updates for Ross Zwisler emails and adding Jan Kara to fsdax" * tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm: fix ars_status output length calculation device-dax: avoid hang on error before devm_memremap_pages() tools/testing/nvdimm: improve emulation of smart injection filesystem-dax: Do not request kaddr and pfn when not required md/dm-writecache: Don't request pointer dummy_addr when not required dax/super: Do not request a pointer kaddr when not required tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access() s390, dcssblk: kaddr and pfn can be NULL to ->direct_access() libnvdimm, pmem: kaddr and pfn can be NULL to ->direct_access() acpi/nfit: queue issuing of ars when an uc error notification comes in libnvdimm: Export max available extent libnvdimm: Use max contiguous area for namespace size MAINTAINERS: Add Jan Kara for filesystem DAX MAINTAINERS: update Ross Zwisler's email address tools/testing/nvdimm: Fix support for emulating controller temperature tools/testing/nvdimm: Make DSM failure code injection an override acpi, nfit: Prefer _DSM over _LSR for namespace label reads libnvdimm: Introduce locked DIMM capacity support	2018-08-25 18:13:10 -07:00
Dave Jiang	cc3d3458d4	acpi/nfit: queue issuing of ars when an uc error notification comes in When the ACPI UC error notifier gets called and ARS_REQ bit is set with the passed in flag, we can receive -EBUSY if ARS_REQ bit is already set for the nfit_spa->ars_state. When that happens, the ARS request is dropped. That can potentially cause us to miss the unreported errors that the on going ARS request does not receive. Add an ARS_REQ_REDO state that will request short ARS upon ARS completion to grab any errors we missed. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>	2018-07-27 15:28:28 -07:00
Dan Williams	099b07a25f	acpi, nfit: Prefer _DSM over _LSR for namespace label reads The _LSR method indicates locked status via error-code-3 returned in the _LSR payload. When any error is returned the payload of _LSR is truncated to a zero-length buffer. The _DSM path in comparison allows system software to retrieve the locked status and namespace label area contents. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-07-14 10:27:00 -07:00
Dave Jiang	ee6581ceba	nfit: fix unchecked dereference in acpi_nfit_ctl Incremental patch to fix the unchecked dereference in acpi_nfit_ctl. Reported by Dan Carpenter: "acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value" from Jun 28, 2018, leads to the following Smatch complaint: drivers/acpi/nfit/core.c:578 acpi_nfit_ctl() warn: variable dereferenced before check 'cmd_rc' (see line 411) drivers/acpi/nfit/core.c 410 411 *cmd_rc = -EINVAL; ^^^^^^^^^^^^^^^^^^ Patch adds unchecked dereference. Fixes: `c1985cefd8` ("acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value") Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2018-07-11 10:25:24 -07:00
Dan Williams	33cc2c9667	acpi, nfit: Fix scrub idle detection The notification of scrub completion happens within the scrub workqueue. That can clearly race someone running scrub_show() and work_busy() before the workqueue has a chance to flush the recently completed work. Add a flag to reliably indicate the idle vs busy state. Without this change applications using poll(2) to wait for scrub-completion may falsely wakeup and read ARS as being busy even though the thread is going idle and then hang indefinitely. Fixes: `bc6ba80858` ("nfit, address-range-scrub: rework and simplify ARS...") Cc: <stable@vger.kernel.org> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-07-05 19:33:53 -07:00
Dave Jiang	c1985cefd8	acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value cmd_rc is passed in by reference to the acpi_nfit_ctl() function and the caller expects a value returned. However, when the package is pass through via the ND_CMD_CALL command, cmd_rc is not touched. Make sure cmd_rc is always set. Fixes: `aef2533822` ("libnvdimm, nfit: centralize command status translation") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-06-30 10:43:39 -07:00
Kees Cook	a86854d0c5	treewide: devm_kzalloc() -> devm_kcalloc() The devm_kzalloc() function has a 2-factor argument form, devm_kcalloc(). This patch replaces cases of: devm_kzalloc(handle, a * b, gfp) with: devm_kcalloc(handle, a * b, gfp) as well as handling cases of: devm_kzalloc(handle, a * b * c, gfp) with: devm_kzalloc(handle, array3_size(a, b, c), gfp) as it's slightly less ugly than: devm_kcalloc(handle, array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: devm_kzalloc(handle, 4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. Some manual whitespace fixes were needed in this patch, as Coccinelle really liked to write "=devm_kcalloc..." instead of "= devm_kcalloc...". The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ expression HANDLE; type TYPE; expression THING, E; @@ ( devm_kzalloc(HANDLE, - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| devm_kzalloc(HANDLE, - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression HANDLE; expression COUNT; typedef u8; typedef __u8; @@ ( devm_kzalloc(HANDLE, - sizeof(u8) * (COUNT) + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(__u8) * (COUNT) + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(char) * (COUNT) + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(u8) * COUNT + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(__u8) * COUNT + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(char) * COUNT + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ expression HANDLE; type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ expression HANDLE; identifier SIZE, COUNT; @@ - devm_kzalloc + devm_kcalloc (HANDLE, - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression HANDLE; expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( devm_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression HANDLE; expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ expression HANDLE; identifier STRIDE, SIZE, COUNT; @@ ( devm_kzalloc(HANDLE, - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression HANDLE; expression E1, E2, E3; constant C1, C2, C3; @@ ( devm_kzalloc(HANDLE, C1 * C2 * C3, ...) \| devm_kzalloc(HANDLE, - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| devm_kzalloc(HANDLE, - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| devm_kzalloc(HANDLE, - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| devm_kzalloc(HANDLE, - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression HANDLE; expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( devm_kzalloc(HANDLE, sizeof(THING) * C2, ...) \| devm_kzalloc(HANDLE, sizeof(TYPE) * C2, ...) \| devm_kzalloc(HANDLE, C1 * C2 * C3, ...) \| devm_kzalloc(HANDLE, C1 * C2, ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - (E1) * E2 + E1, E2 , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - (E1) * (E2) + E1, E2 , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Dan Williams	d4dd70915e	acpi, nfit: Remove ecc_unit_size The "Clear Error Unit" may be smaller than the ECC unit size on some devices. For example, poison may be tracked at 64-byte alignment even though the ECC unit is larger. Unless / until the ACPI specification provides a non-ambiguous way to communicate this property do not expose this to userspace. Software that had been using this property must already be prepared for the case where the property is not provided on older kernels, so it is safe to remove this attribute. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-06-03 12:49:15 -07:00
Linus Torvalds	9f3a0941fb	libnvdimm for 4.17 * A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. * The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. * Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. * The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. * Miscellaneous cleanups and updates to unit test infrastructure. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJazDt5AAoJEB7SkWpmfYgCqGMQALLwdPeY87cUK7AvQ2IXj46B lJgeVuHPzyQDbC03AS5uUYnnU3I5lFd7i4y7ZrywNpFs4lsb/bNmbUpQE5xp+Yvc 1MJ/JYDIP5X4misWYm3VJo85N49+VqSRgAQk52PBigwnZ7M6/u4cSptXM9//c9JL /NYbat6IjjY6Tx49Tec6+F3GMZjsFLcuTVkQcREoOyOqVJE4YpP0vhNjEe0vq6vr EsSWiqEI5VFH4PfJwKdKj/64IKB4FGKj2A5cEgjQBxW2vw7tTJnkRkdE3jDUjqtg xYAqGp/Dqs4+bgdYlT817YhiOVrcr5mOHj7TKWQrBPgzKCbcG5eKDmfT8t+3NEga 9kBlgisqIcG72lwZNA7QkEHxq1Omy9yc1hUv9qz2YA0G+J1WE8l1T15k1DOFwV57 qIrLLUypklNZLxvrzNjclempboKc4JCUlj+TdN5E5Y6pRs55UWTXaP7Xf5O7z0vf l/uiiHkc3MPH73YD2PSEGFJ8m8EU0N8xhrcz3M9E2sHgYCnbty1Lw3FH0/GhThVA ya1mMeDdb8A2P7gWCBk1Lqeig+rJKXSey4hKM6D0njOEtMQO1H4tFqGjyfDX1xlJ 3plUR9WBVEYzN5+9xWbwGag/ezGZ+NfcVO2gmy6yXiEph796BxRAZx/18zKRJr0m 9eGJG1H+JspcbtLF9iHn =acZQ -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit `d2c997c0f1` ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...	2018-04-10 10:25:57 -07:00
Dan Williams	1ed41b5696	Merge branch 'for-4.17/libnvdimm' into libnvdimm-for-next	2018-04-09 10:50:08 -07:00
Dan Williams	bca811a7fd	nfit, address-range-scrub: add module option to skip initial ars After attempting to quickly retrieve known errors the kernel proceeds to kick off a long running ARS. Add a module option to disable this behavior at initialization time, or at new region discovery time. Otherwise, ARS can be started manually regardless of the state of this setting. Co-developed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-07 08:44:55 -07:00

1 2 3

132 Commits