linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-25 22:15:40 +07:00

Author	SHA1	Message	Date
Dmitry Osipenko	43240bbd87	gpu: host1x: At first try a non-blocking allocation for the gather copy The blocking gather copy allocation is a major performance downside of the Host1x firewall, it may take hundreds milliseconds which is unacceptable for the real-time graphics operations. Let's try a non-blocking allocation first as a least invasive solution, it makes opentegra (Xorg driver) performance indistinguishable with/without the firewall. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2017-06-15 14:25:56 +02:00
Dmitry Osipenko	a47ac10e6e	gpu: host1x: Check waits in the firewall Check waits in the firewall in a way it is done for relocations. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2017-06-15 14:24:41 +02:00
Dmitry Osipenko	0f563a4bf6	gpu: host1x: Forbid unrelated SETCLASS opcode in the firewall Several channels could be made to write the same unit concurrently via the SETCLASS opcode, trusting userspace is a bad idea. It should be possible to drop the per-client channel reservation and add a per-unit locking by inserting MLOCK's to the command stream to re-allow the SETCLASS opcode, but it will be much more work. Let's forbid the unit-unrelated class changes for now. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2017-06-15 14:23:50 +02:00
Dmitry Osipenko	ef81624994	gpu: host1x: Forbid RESTART opcode in the firewall The RESTART opcode terminates the gather and restarts the CDMA fetching from a specified word << 2 relative to the CDMA start address. That shouldn't be allowed to be done by userspace. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2017-06-15 14:23:18 +02:00
Dmitry Osipenko	571cbf70c1	gpu: host1x: Forbid relocation address shifting in the firewall Incorrectly shifted relocation address will cause a lower memory corruption and likely a hang on a write or a read of an arbitrary data in case of IOMMU absence. As of now, there is no known use for the address shifting and adding a proper shifts / sizes validation is a much more work. Let's forbid shifts in the firewall till a proper validation is implemented. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2017-06-15 14:22:32 +02:00
Dmitry Osipenko	47f89c10dd	gpu: host1x: Do not leak BO's phys address to userspace Perform gathers coping before patching them, so that original gathers are left untouched. That's not as bad as leaking kernel addresses, but still doesn't feel right. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2017-06-15 14:22:03 +02:00
Dmitry Osipenko	e5855aa3e6	gpu: host1x: Correct host1x_job_pin() error handling In case of relocations / waitchecks patching failure the jobs pins stay referenced till DRM file get closed, wasting memory. Add the missed unpinning. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2017-06-15 14:21:46 +02:00
Dmitry Osipenko	3833d16f16	gpu: host1x: Initialize firewall class to the job's one The commands stream is prepended by the jobs class on the CDMA submission, so that explicitly setting a module class in the commands stream isn't necessary. The firewall initializes its class to 0 and the command stream that doesn't explicitly specify the class effectively bypasses the firewall. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2017-06-15 14:21:23 +02:00
Mikko Perttunen	404bfb78da	gpu: host1x: Add IOMMU support Add support for the Host1x unit to be located behind an IOMMU. This is required when gather buffers may be allocated non-contiguously in physical memory, as can be the case when TegraDRM is also using the IOMMU. Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2017-04-05 18:11:43 +02:00
Arto Merilainen	f08ef2d1a1	gpu: host1x: Store device address to all bufs Currently job pinning is optimized to handle only the first buffer using a certain host1x_bo object and all subsequent buffers using the same host1x_bo are considered done. In most cases this is correct, however, in case the same host1x_bo is used in multiple gathers inside the same job, we skip also storing the device address (physical or iova) to this buffer. This patch reworks the host1x_job_pin() to store the device address to all gathers. Signed-off-by: Andrew Chew <achew@nvidia.com> Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2016-11-11 15:21:07 +01:00
Thierry Reding	0b8070d12e	gpu: host1x: Whitespace cleanup for readability Insert a number of blank lines in places where they increase readability of the code. Also collapse various variable declarations to shorten some functions and finally rewrite some code for readability. Signed-off-by: Thierry Reding <treding@nvidia.com>	2016-06-23 11:59:30 +02:00
Thierry Reding	6df633d0dc	gpu: host1x: Fix a couple of checkpatch warnings Fix a couple of occurrences where no blank line was used to separate variable declarations from code or where block comments were wrongly formatted. Signed-off-by: Thierry Reding <treding@nvidia.com>	2016-06-23 11:59:28 +02:00
Thierry Reding	5c0d8d386b	gpu: host1x: Use unsigned int consistently for IDs IDs can never be negative so use unsigned int. In some instances an explicitly sized type (such as u32) was used for no particular reason, so turn those into unsigned int as well for consistency. Signed-off-by: Thierry Reding <treding@nvidia.com>	2016-06-23 11:59:24 +02:00
Linus Torvalds	266c73b777	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "This is the main drm pull request for 4.6 kernel. Overall the coolest thing here for me is the nouveau maxwell signed firmware support from NVidia, it's taken a long while to extract this from them. I also wish the ARM vendors just designed one set of display IP, ARM display block proliferation is definitely increasing. Core: - drm_event cleanups - Internal API cleanup making mode_fixup optional. - Apple GMUX vga switcheroo support. - DP AUX testing interface Panel: - Refactoring of DSI core for use over more transports. New driver: - ARM hdlcd driver i915: - FBC/PSR (framebuffer compression, panel self refresh) enabled by default. - Ongoing atomic display support work - Ongoing runtime PM work - Pixel clock limit checks - VBT DSI description support - GEM fixes - GuC firmware scheduler enhancements amdkfd: - Deferred probing fixes to avoid make file or link ordering. amdgpu/radeon: - ACP support for i2s audio support. - Command Submission/GPU scheduler/GPUVM optimisations - Initial GPU reset support for amdgpu vmwgfx: - Support for DX10 gen mipmaps - Pageflipping and other fixes. exynos: - Exynos5420 SoC support for FIMD - Exynos5422 SoC support for MIPI-DSI nouveau: - GM20x secure boot support - adds acceleration for Maxwell GPUs. - GM200 support - GM20B clock driver support - Power sensors work etnaviv: - Correctness fixes for GPU cache flushing - Better support for i.MX6 systems. imx-drm: - VBlank IRQ support - Fence support - OF endpoint support msm: - HDMI support for 8996 (snapdragon 820) - Adreno 430 support - Timestamp queries support virtio-gpu: - Fixes for Android support. rockchip: - Add support for Innosilicion HDMI rcar-du: - Support for 4 crtcs - R8A7795 support - RCar Gen 3 support omapdrm: - HDMI interlace output support - dma-buf import support - Refactoring to remove a lot of legacy code. tilcdc: - Rewrite of pageflipping code - dma-buf support - pinctrl support vc4: - HDMI modesetting bug fixes - Significant 3D performance improvement. fsl-dcu (FreeScale): - Lots of fixes tegra: - Two small fixes sti: - Atomic support for planes - Improved HDMI support" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1063 commits) drm/amdgpu: release_pages requires linux/pagemap.h drm/sti: restore mode_fixup callback drm/amdgpu/gfx7: add MTYPE definition drm/amdgpu: removing BO_VAs shouldn't be interruptible drm/amd/powerplay: show uvd/vce power gate enablement for tonga. drm/amd/powerplay: show uvd/vce power gate info for fiji drm/amdgpu: use sched fence if possible drm/amdgpu: move ib.fence to job.fence drm/amdgpu: give a fence param to ib_free drm/amdgpu: include the right version of gmc header files for iceland drm/radeon: fix indentation. drm/amd/powerplay: add uvd/vce dpm enabling flag to fix the performance issue for CZ drm/amdgpu: switch back to 32bit hw fences v2 drm/amdgpu: remove amdgpu_fence_is_signaled drm/amdgpu: drop the extra fence range check v2 drm/amdgpu: signal fences directly in amdgpu_fence_process drm/amdgpu: cleanup amdgpu_fence_wait_empty v2 drm/amdgpu: keep all fences in an RCU protected array v2 drm/amdgpu: add number of hardware submissions to amdgpu_fence_driver_init_ring drm/amdgpu: RCU protected amd_sched_fence_release ...	2016-03-21 13:48:00 -07:00
Markus Elfring	341917fe2b	gpu: host1x: Use a signed return type for do_relocs() The return type "unsigned int" was used by the do_relocs() function despite the fact that it will eventually return a negative error code. Use a signed integer instead to accomodate for error codes. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Thierry Reding <treding@nvidia.com>	2016-03-16 13:45:44 +01:00
Luis R. Rodriguez	f6e45661f9	dma, mm/pat: Rename dma__writecombine() to dma__wc() Rename dma__writecombine() to dma__wc(), so that the naming is coherent across the various write-combining APIs. Keep the old names for compatibility for a while, these can be removed at a later time. A guard is left to enable backporting of the rename, and later remove of the old mapping defines seemlessly. Build tested successfully with allmodconfig. The following Coccinelle SmPL patch was used for this simple transformation: @ rename_dma_alloc_writecombine @ expression dev, size, dma_addr, gfp; @@ -dma_alloc_writecombine(dev, size, dma_addr, gfp) +dma_alloc_wc(dev, size, dma_addr, gfp) @ rename_dma_free_writecombine @ expression dev, size, cpu_addr, dma_addr; @@ -dma_free_writecombine(dev, size, cpu_addr, dma_addr) +dma_free_wc(dev, size, cpu_addr, dma_addr) @ rename_dma_mmap_writecombine @ expression dev, vma, cpu_addr, dma_addr, size; @@ -dma_mmap_writecombine(dev, vma, cpu_addr, dma_addr, size) +dma_mmap_wc(dev, vma, cpu_addr, dma_addr, size) We also keep the old names as compatibility helpers, and guard against their definition to make backporting easier. Generated-by: Coccinelle SmPL Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bhelgaas@google.com Cc: bp@suse.de Cc: dan.j.williams@intel.com Cc: daniel.vetter@ffwll.ch Cc: dhowells@redhat.com Cc: julia.lawall@lip6.fr Cc: konrad.wilk@oracle.com Cc: linux-fbdev@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: luto@amacapital.net Cc: mst@redhat.com Cc: tomi.valkeinen@ti.com Cc: toshi.kani@hp.com Cc: vinod.koul@intel.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1453516462-4844-1-git-send-email-mcgrof@do-not-panic.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-09 14:57:51 +01:00
Thierry Reding	961e3beae3	drm/tegra: Make job submission 64-bit safe Job submission currently relies on the fact that struct drm_tegra_reloc and struct host1x_reloc are the same size and uses a simple call to the copy_from_user() function to copy them to kernel space. This causes the handle to be stored in the buffer object field, which then needs a cast to a 32 bit integer to resolve it to a proper buffer object pointer and store it back in the buffer object field. On 64-bit architectures that will no longer work, since pointers are 64 bits wide whereas handles will remain 32 bits. This causes the sizes of both structures to because different and copying will no longer work. Fix this by adding a new function, host1x_reloc_get_user(), that copies the structures field by field. While at it, use substructures for the command and target buffers in struct host1x_reloc for better readability. Also use unsized types to make it more obvious that this isn't part of userspace ABI. Signed-off-by: Thierry Reding <treding@nvidia.com>	2014-08-04 10:07:36 +02:00
Erik Faye-Lund	89e6e8c85f	gpu: host1x: do not check previously handled gathers When patching gathers, we don't need to check against gathers with lower indices than the current one, as they are guaranteed to already have been handled. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Acked-By: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2014-02-12 07:50:37 +01:00
Thierry Reding	fae798a156	gpu: host1x: Export public API Make the public API symbols visible so that depending drivers can be built as a module. Signed-off-by: Thierry Reding <treding@nvidia.com>	2013-12-19 09:29:50 +01:00
Thierry Reding	35d747a81d	gpu: host1x: Expose syncpt and channel functionality Expose the buffer objects, syncpoint and channel functionality in the public public header so that drivers can use them. Signed-off-by: Thierry Reding <treding@nvidia.com>	2013-10-31 09:20:11 +01:00
Thierry Reding	d77563ff56	gpu: host1x: firewall: Refactor register check The same code sequence is used in various places to validate a register access in the command stream. This can be refactored into a separate function. Signed-off-by: Thierry Reding <treding@nvidia.com>	2013-10-31 09:20:08 +01:00
Thierry Reding	d7fbcf477a	gpu: host1x: firewall: Rename cmdbuf_id -> cmdbuf The value stored in this field is a pointer to a command buffer, not an ID. Avoid some confusion by reflecting that in the field's name. Signed-off-by: Thierry Reding <treding@nvidia.com>	2013-10-31 09:20:08 +01:00
Thierry Reding	37857cd2c5	gpu: host1x: Fix alignment of function arguments Arguments on subsequent lines should be aligned with the first argument. This one occurrence went unnoticed during code review. Signed-off-by: Thierry Reding <treding@nvidia.com>	2013-10-31 09:20:08 +01:00
Erik Faye-Lund	a9ff999538	gpu: host1x: check relocs after all gathers are consumed The num_relocs count are passed to the kernel per job, not per gather. For multi-gather jobs, we would previously fail if there were relocs in other gathers aside from the first one. Fix this by simply moving the check until all gathers have been consumed. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Reviewed-by: Arto Merilainen <amerilainen@nvidia.com> Acked-By: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2013-10-31 09:20:04 +01:00
Dan Carpenter	745cecc07c	gpu: host1x: returning success instead of -ENOMEM There is a mistake here so it returns PTR_ERR(NULL) which is success instead of -ENOMEM. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2013-08-27 10:20:12 +02:00
Dan Carpenter	f5fda676e9	gpu: host1x: fix an integer overflow check Tegra is a 32 bit arch. On 32 bit systems then size_t is 32 bits so "total" will never be higher than UINT_MAX because of integer overflows. We need cast to u64 first before doing the math. Also the addition earlier: unsigned int num_unpins = num_cmdbufs + num_relocs; That can overflow as well, but I think it's still safe because we check both "num_cmdbufs" and "num_relocs" again in this test. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thierry Reding <treding@nvidia.com>	2013-08-27 10:20:11 +02:00
Arto Merilainen	3364cd2890	gpu: host1x: Copy gathers before verification The firewall verified gather buffers before copying them. This allowed a malicious application to rewrite the buffer content by timing the rewrite carefully. This patch makes the buffer validation occur after copying the buffers. Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>	2013-06-22 12:43:53 +02:00
Terje Bergstrom	afac0e43c6	gpu: host1x: Don't reset firewall between gathers The firewall was reinitialised for each gather. Because the filter was reinitialised, it did not track the class over gather boundaries. This allowed the user application to set host1x class to one class in one gather and use that class in another gather without firewall having knowledge about that. Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>	2013-06-22 12:43:53 +02:00
Arto Merilainen	5060d8ec7c	gpu: host1x: Check reloc table before usage The firewall assumed that the user space always delivers a relocation table when it is accessing address registers. If userspace did not deliver a relocation table and tried to access the address registers, the code performed bad memory accesses. This patch modifies the firewall to check correctly that the firewall table is available before accessing it. In addition, check_reloc() is converted to use boolean return value (true when the reloc is valid, false when invalid). Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Acked-By: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>	2013-06-22 12:43:52 +02:00
Terje Bergstrom	64c173d3a2	gpu: host1x: Check INCR opcode correctly The firewall code used a wrong loop condition (pointer to a structure) while checking INCR opcode. This patch fixes the code to use correct loop condition (number of words remaining). Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>	2013-06-22 12:43:52 +02:00
Terje Bergstrom	6579324a41	gpu: host1x: Add channel support Add support for host1x client modules, and host1x channels to submit work to the clients. Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-by: Thierry Reding <thierry.reding@avionic-design.de> Tested-by: Thierry Reding <thierry.reding@avionic-design.de> Tested-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>	2013-04-22 12:32:43 +02:00

31 Commits