linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-27 12:15:08 +07:00

Author	SHA1	Message	Date
Chris Wilson	8ba306a6a3	drm/i915: Share per-timeline HWSP using a slab suballocator If we restrict ourselves to only using a cacheline for each timeline's HWSP (we could go smaller, but want to avoid needless polluting cachelines on different engines between different contexts), then we can suballocate a single 4k page into 64 different timeline HWSP. By treating each fresh allocation as a slab of 64 entries, we can keep it around for the next 64 allocation attempts until we need to refresh the slab cache. John Harrison noted the issue of fragmentation leading to the same worst case performance of one page per timeline as before, which can be mitigated by adopting a freelist. v2: Keep all partially allocated HWSP on a freelist This is still without migration, so it is possible for the system to end up with each timeline in its own page, but we ensure that no new allocation would needless allocate a fresh page! v3: Throw a selftest at the allocator to try and catch invalid cacheline reuse. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-4-chris@chris-wilson.co.uk	2019-01-28 19:07:06 +00:00
Chris Wilson	7ce5b6850b	drm/i915/selftests: Use mul_u32_u32() for 32b x 32b -> 64b result As realised by commit `9e3d6223d2` ("math64, timers: Fix 32bit mul_u64_u32_shr() and friends"), GCC does not always generate ideal code for performing a 32b x 32b multiply returning a 64b result (i.e. where we idiomatically use u64 result = (u64)x * (u32)x). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170913105154.2910-2-chris@chris-wilson.co.uk Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>	2017-09-13 13:27:20 +01:00
Chris Wilson	4797948071	drm/i915: Squash repeated awaits on the same fence Track the latest fence waited upon on each context, and only add a new asynchronous wait if the new fence is more recent than the recorded fence for that context. This requires us to filter out unordered timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the absence of a universal identifier, we have to use our own i915->mm.unordered_timeline token. v2: Throw around the debug crutches v3: Inline the likely case of the pre-allocation cache being full. v4: Drop the pre-allocation support, we can lose the most recent fence in case of allocation failure -- it just means we may emit more awaits than strictly necessary but will not break. v5: Trim allocation size for leaf nodes, they only need an array of u32 not pointers. v6: Create mock_timeline to tidy selftest writing v7: s/intel_timeline_sync_get/intel_timeline_sync_is_later/ (Tvrtko) v8: Prune the stale sync points when we idle. v9: Include a small benchmark in the kselftests v10: Separate the idr implementation into its own compartment. (Tvrkto) v11: Refactor igt_sync kselftests to avoid deep nesting (Tvrkto) v12: __sync_leaf_idx() to assert that p->height is 0 when checking leaves v13: kselftests to investigate struct i915_syncmap itself (Tvrtko) v14: Foray into ascii art graphs v15: Take into account that the random lookup/insert does 2 prng calls, not 1, when benchmarking, and use for_each_set_bit() (Tvrtko) v16: Improved ascii art Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-4-chris@chris-wilson.co.uk	2017-05-03 11:08:48 +01:00
Chris Wilson	953c7f82eb	drm/i915: Provide a hook for selftests Some pieces of code are independent of hardware but are very tricky to exercise through the normal userspace ABI or via debugfs hooks. Being able to create mock unit tests and execute them through CI is vital. Start by adding a central point where we can execute unit tests and a parameter to enable them. This is disabled by default as the expectation is that these tests will occasionally explode. To facilitate integration with igt, any parameter beginning with i915.igt__ is interpreted as a subtest executable independently via igt/drv_selftest. Two classes of selftests are recognised: mock unit tests and integration tests. Mock unit tests are run as soon as the module is loaded, before the device is probed. At that point there is no driver instantiated and all hw interactions must be "mocked". This is very useful for writing universal tests to exercise code not typically run on a broad range of architectures. Alternatively, you can hook into the live selftests and run when the device has been instantiated - hw interactions are real. v2: Add a macro for compiling conditional code for mock objects inside real objects. v3: Differentiate between mock unit tests and late integration test. v4: List the tests in natural order, use igt to sort after modparam. v5: s/late/live/ v6: s/unsigned long/unsigned int/ v7: Use igt_ prefixes for long helpers. v8: Deobfuscate macros overriding functions, stop using -I$(src) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-1-chris@chris-wilson.co.uk	2017-02-13 20:45:21 +00:00

4 Commits