Commit Graph

48 Commits

Author SHA1 Message Date
Dave Airlie
7a88cbd8d6 Linux 4.14-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZ9kEFAAoJEHm+PkMAQRiGw6wH/0j197qyGd0hkVFMJO6LAgN3
 KQWS4nZ5BkVDocwv0RVnUJTtXqU1eozFgdVEtSoaFXpzlHGuptR2Tau9efDCJ7w3
 /utZxqvhGebZd2T+j+/o/LE8BRQxhADBNJq2D/o0WNt8ecxuG0GIkhkEYt/o3z1v
 /sxlwVwzXB7Dc/h1WcgGJG7cS6L9KzzAzGAS/iNvdFrPOygHBv8c0MxVZIiBIeeK
 1nZdyvbyM8uenSyG+prGt9ENrqXZxxfwUxIchi2V7A9m1WmD5zijNkf1JCWji/O+
 UsA1auxna7MwoxjxqZuGm4MlKOwZ+8xutk4JGgc+aP/ulndJbJYu+4op/3vaFBM=
 =Mhx+
 -----END PGP SIGNATURE-----

Backmerge tag 'v4.14-rc7' into drm-next

Linux 4.14-rc7

Requested by Ben Skeggs for nouveau to avoid major conflicts,
and things were getting a bit conflicty already, esp around amdgpu
reverts.
2017-11-02 12:40:41 +10:00
Lionel Landwerlin
7277f75504 drm/i915/perf: fix perf enable/disable ioctls with 32bits userspace
The compat callback was missing and triggered failures in 32bits
userspace when enabling/disable the perf stream. We don't require any
particular processing here as these ioctls don't take any argument.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: eec688e142 ("drm/i915: Add i915 perf infrastructure")
Cc: linux-stable <stable@vger.kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171024152728.4873-1-lionel.g.landwerlin@intel.com
(cherry picked from commit 191f896085)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-10-25 08:16:13 -07:00
Michal Wajdeczko
4f044a88a8 drm/i915: Rename global i915 to i915_modparams
Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.

v5: pure rename
v6: fix

Credits-to: Coccinelle

@@
identifier n;
@@
(
-	i915.n
+	i915_modparams.n
)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com
2017-09-22 14:50:36 +03:00
Lionel Landwerlin
22ea4f3528 drm/i915/perf: add support for Coffeelake GT2
Add the test configuration & timestamp frequency for Coffeelake GT2.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170918112124.29541-3-lionel.g.landwerlin@intel.com
2017-09-18 19:46:36 +01:00
Lionel Landwerlin
342a2c840e drm/i915/perf: disable clk ratio reports on gen9
We're doing this on all Gen9 based platforms, let's just check the gen
rather than listing every single platforms.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170918112124.29541-2-lionel.g.landwerlin@intel.com
2017-09-18 19:46:35 +01:00
Chris Wilson
28b6cb0820 drm/i915/perf: Drop redundant check for perf.initialised on reset
As we cannot have an exclusive stream set if the perf has not been
initialized, we only need to check for that exclusive stream.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170810175743.25401-3-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-08-10 21:43:11 +01:00
Chris Wilson
84a095e413 drm/i915/perf: Drop lockdep assert for i915_oa_init_reg_state()
This is called from execlist context init which we need to be unlocked.
Commit f89823c212 ("drm/i915/perf: Implement
I915_PERF_ADD/REMOVE_CONFIG interface") added a lockdep assert to this
path for unclear reasons, remove it again!

Fixes: f89823c212 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170810175743.25401-2-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-08-10 21:42:36 +01:00
Chris Wilson
40f75ea466 drm/i915/perf: Initialise dynamic sysfs group before creation
Another case where we need to call sysfs_attr_init() to setup the
internal lockdep class prior to use:

[    9.325229] BUG: key ffff880168bc7bb0 not in .data!
[    9.325240] DEBUG_LOCKS_WARN_ON(1)
[    9.325250] ------------[ cut here ]------------
[    9.325280] WARNING: CPU: 1 PID: 275 at kernel/locking/lockdep.c:3156 lockdep_init_map+0x1b2/0x1c0
[    9.325301] Modules linked in: intel_powerclamp(+) coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915(+) snd_hda_intel snd_hda_codec snd_hwdep r8169 mii snd_hda_core snd_pcm prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
[    9.325375] CPU: 1 PID: 275 Comm: modprobe Not tainted 4.13.0-rc4-CI-Trybot_1040+ #1
[    9.325395] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0045.B51.1704281422 04/28/2017
[    9.325422] task: ffff8801721a4ec0 task.stack: ffffc900001dc000
[    9.325440] RIP: 0010:lockdep_init_map+0x1b2/0x1c0
[    9.325456] RSP: 0018:ffffc900001dfa10 EFLAGS: 00010282
[    9.325473] RAX: 0000000000000016 RBX: ffff880168d54b80 RCX: 0000000000000000
[    9.325488] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810f0800
[    9.325505] RBP: ffffc900001dfa30 R08: 0000000000000001 R09: 0000000000000000
[    9.325521] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880168bc7bb0
[    9.325537] R13: 0000000000000000 R14: ffff880168bc7b98 R15: ffffffff81a263a0
[    9.325554] FS:  00007fb60c3fd700(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000
[    9.325574] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.325588] CR2: 0000006582777d80 CR3: 000000016d818000 CR4: 00000000003406e0
[    9.325604] Call Trace:
[    9.325618]  __kernfs_create_file+0x76/0xe0
[    9.325632]  sysfs_add_file_mode_ns+0x8a/0x1a0
[    9.325646]  internal_create_group+0xea/0x2c0
[    9.325660]  sysfs_create_group+0x13/0x20
[    9.325737]  i915_perf_register+0xde/0x220 [i915]
[    9.325800]  i915_driver_load+0xa77/0x16c0 [i915]
[    9.325863]  i915_pci_probe+0x37/0x90 [i915]
[    9.325880]  pci_device_probe+0xa8/0x130
[    9.325894]  driver_probe_device+0x29c/0x450
[    9.325908]  __driver_attach+0xe3/0xf0
[    9.325922]  ? driver_probe_device+0x450/0x450
[    9.325935]  bus_for_each_dev+0x62/0xa0
[    9.325948]  driver_attach+0x1e/0x20
[    9.325960]  bus_add_driver+0x173/0x270
[    9.325974]  driver_register+0x60/0xe0
[    9.325986]  __pci_register_driver+0x60/0x70
[    9.326044]  i915_init+0x6f/0x78 [i915]
[    9.326066]  ? 0xffffffffa024e000
[    9.326079]  do_one_initcall+0x43/0x170
[    9.326094]  ? rcu_read_lock_sched_held+0x7a/0x90
[    9.326109]  ? kmem_cache_alloc_trace+0x261/0x2d0
[    9.326124]  do_init_module+0x5f/0x206
[    9.326137]  load_module+0x2561/0x2da0
[    9.326150]  ? show_coresize+0x30/0x30
[    9.326165]  ? kernel_read_file+0x105/0x190
[    9.326180]  SyS_finit_module+0xc1/0x100
[    9.326192]  ? SyS_finit_module+0xc1/0x100
[    9.326210]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[    9.326223] RIP: 0033:0x7fb60bf359f9
[    9.326234] RSP: 002b:00007fff92b47c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    9.326255] RAX: ffffffffffffffda RBX: ffffffff814898a3 RCX: 00007fb60bf359f9
[    9.326271] RDX: 0000000000000000 RSI: 00000028a9ceef8b RDI: 0000000000000000
[    9.326287] RBP: ffffc900001dff88 R08: 0000000000000000 R09: 0000000000000000
[    9.326303] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000040000
[    9.326319] R13: 00000028aaef2a70 R14: 0000000000000000 R15: 00000028aaeee5d0
[    9.326339]  ? __this_cpu_preempt_check+0x13/0x20
[    9.326353] Code: f1 39 00 85 c0 0f 84 38 ff ff ff 83 3d 9f 44 ce 01 00 0f 85 2b ff ff ff 48 c7 c6 b2 a2 c7 81 48 c7 c7 53 40 c5 81 e8 3f 82 01 00 <0f> ff e9 11 ff ff ff 0f 1f 80 00 00 00 00 55 31 c9 31 d2 31 f6

Fixes: 701f8231a2 ("drm/i915/perf: prune OA configs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170810175743.25401-1-chris@chris-wilson.co.uk
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-10 21:41:36 +01:00
Chris Wilson
28152a238b drm/i915/perf: Initialise the dynamic sysfs attr
Use sysfs_attr_init() to dynamically initialise the
oa_config->sysfs_metric_id.attr as it has the important side-effect of
setting the lockdep key.

[    4.971513] [drm] Initialized i915 1.6.0 20170731 for 0000:00:02.0 on minor 0
[    4.973489] BUG: key ffff88026f6e7bb0 not in .data!
[    4.973506] DEBUG_LOCKS_WARN_ON(1)
[    4.973518] ------------[ cut here ]------------
[    4.973547] WARNING: CPU: 1 PID: 258 at kernel/locking/lockdep.c:3156 lockdep_init_map+0x1b2/0x1c0
[    4.973567] Modules linked in: i915(+) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mei_me mii mei lpc_ich prime_numbers i2c_hid pinctrl_broxton pinctrl_intel
[    4.973645] CPU: 1 PID: 258 Comm: systemd-udevd Not tainted 4.13.0-rc3-CI-CI_DRM_2915+ #1
[    4.973664] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4205-ITX, BIOS P1.10 09/29/2016
[    4.973686] task: ffff8802704c2740 task.stack: ffffc90000224000
[    4.973700] RIP: 0010:lockdep_init_map+0x1b2/0x1c0
[    4.973712] RSP: 0018:ffffc90000227a10 EFLAGS: 00010282
[    4.973726] RAX: 0000000000000016 RBX: ffff880262aac010 RCX: 0000000000000000
[    4.973741] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810ed1ab
[    4.973757] RBP: ffffc90000227a30 R08: 0000000000000001 R09: 0000000000000000
[    4.973774] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88026f6e7bb0
[    4.973789] R13: 0000000000000000 R14: ffff88026f6e7b98 R15: ffffffff81a24da0
[    4.973805] FS:  00007f588d7f58c0(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000
[    4.973823] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.973837] CR2: 00000082482e32a0 CR3: 0000000270531000 CR4: 00000000003406e0
[    4.973852] Call Trace:
[    4.973864]  __kernfs_create_file+0x71/0xe0
[    4.973876]  sysfs_add_file_mode_ns+0x85/0x1a0
[    4.973890]  internal_create_group+0xe5/0x2b0
[    4.973903]  sysfs_create_group+0xe/0x10
[    4.973985]  i915_perf_register+0xd9/0x220 [i915]
[    4.974044]  i915_driver_load+0xa72/0x16b0 [i915]
[    4.974124]  i915_pci_probe+0x32/0x90 [i915]

Annoyingly detected by CI, but not reported due to it occurring during boot
and disabling lockdep for later runs.

Fixes: f89823c212 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Andrzej Datczuk <andrzej.datczuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803223700.10329-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-08-04 13:10:10 +01:00
Lionel Landwerlin
f89823c212 drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface
The motivation behind this new interface is expose at runtime the
creation of new OA configs which can be used as part of the i915 perf
open interface. This will enable the kernel to learn new configs which
may be experimental, or otherwise not part of the core set currently
available through the i915 perf interface.

v2: Drop DRM_ERROR for userspace errors (Matthew)
    Add padding to userspace structure (Matthew)
    s/guid/uuid/ (Matthew)

v3: Use u32 instead of int to iterate through registers (Matthew)

v4: Lock access to dynamic config list (Lionel)

v5: by Matthew:
    Fix uninitialized error values
    Fix incorrect unwiding when opening perf stream
    Use kmalloc_array() to store register
    Use uuid_is_valid() to valid config uuids
    Declare ioctls as write only
    Check padding members are set to 0
    by Lionel:
    Return ENOENT rather than EINVAL when trying to remove non
    existing config

v6: by Chris:
    Use ref counts for OA configs
    Store UUID in drm_i915_perf_oa_config rather then using pointer
    Shuffle fields of drm_i915_perf_oa_config to avoid padding

v7: by Chris
    Rename uapi pointers fields to end with '_ptr'

v8: by Andrzej, Marek, Sebastian
    Update register whitelisting
    by Lionel
    Add more register names for documentation
    Allow configuration programming in non-paranoid mode
    Add support for value filter for a couple of registers already
    programmed in other part of the kernel

v9: Documentation fix (Lionel)
    Allow writing WAIT_FOR_RC6_EXIT only on Gen8+ (Andrzej)

v10: Perform read access_ok() on register pointers (Lionel)

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Andrzej Datczuk <andrzej.datczuk@intel.com>
Reviewed-by: Andrzej Datczuk <andrzej.datczuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-2-lionel.g.landwerlin@intel.com
2017-08-03 18:19:53 +01:00
Lionel Landwerlin
28964cf25e drm/i915/perf: disable NOA logic when not used
We already do it on Haswell and the documentation says it saves power.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-5-lionel.g.landwerlin@intel.com
2017-08-03 18:19:10 +01:00
Lionel Landwerlin
3802c5cb20 drm/i915/perf: leave GDT_CHICKEN_BITS programming in configs
There will be a need for userspaces configurations to set this
register. We can apply the same model inside the kernel for test
configs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-4-lionel.g.landwerlin@intel.com
2017-08-03 18:18:44 +01:00
Lionel Landwerlin
701f8231a2 drm/i915/perf: prune OA configs
In the following commit we'll introduce loadable userspace
configs. This change reworks how configurations are handled in the
perf driver and retains only the test configurations in kernel space.

We now store the test config in dev_priv and resolve the id only once
when opening the perf stream. The OA config is then handled through a
pointer to the structure holding the configuration details.

v2: Rework how test configs are handled (Lionel)

v3: Use u32 to hold number of register (Matthew)

v4: Removed unused dev_priv->perf.oa.current_config variable (Matthew)

v5: Lock device when accessing exclusive_stream (Lionel)

v6: Ensure OACTXCONTROL is always reprogrammed (Lionel)

v7: Switch a couple of index variable from int to u32 (Matthew)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-3-lionel.g.landwerlin@intel.com
2017-08-03 18:18:05 +01:00
Lionel Landwerlin
01d928e9a1 drm/i915/perf: fix flex eu registers programming
We were reserving fewer dwords in the ring than necessary. Indeed
we're always writing all registers once, so discard the actual number
of registers given by the user and just program the whitelisted ones
once.

Fixes: 19f81df285 ("drm/i915/perf: Add OA unit support for Gen 8+")
Reported-by: Matthew Auld <matthew.william.auld@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v4.12+
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-6-lionel.g.landwerlin@intel.com
2017-08-03 18:17:47 +01:00
Imre Deak
635f56c342 drm/i915: Fix error checking/locking in perf/lookup_context()
1acfc104cd missed to convert this one caller to be lockless. The side
effect of that was that the error check in lookup_context() became
incorrect. Convert now this caller too.

Fixes: 1acfc104cd ("drm/i915: Enable rcu-only context lookups")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170714151242.517-1-imre.deak@intel.com
2017-07-17 14:22:17 +03:00
sagar.a.kamble@intel.com
987f8c444a drm/i915: Hold RPM wakelock while initializing OA buffer
OA buffer initialization involves access to HW registers to set
the OA base, head and tail. Ensure device is awake while setting
these. With this, all oa.ops are covered under RPM and forcewake
wakelock.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1498585181-23048-1-git-send-email-sagar.a.kamble@intel.com
Fixes: d79651522e ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Cc: <stable@vger.kernel.org> # v4.11+
2017-07-03 12:00:17 +02:00
Chris Wilson
5f09a9c8ab drm/i915: Allow contexts to be unreferenced locklessly
If we move the actual cleanup of the context to a worker, we can allow
the final free to be called from any context and avoid undue latency in
the caller.

v2: Negotiate handling the delayed contexts free by flushing the
workqueue before calling i915_gem_context_fini() and performing the final
free of the kernel context directly
v3: Flush deferred frees before new context allocations

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-2-chris@chris-wilson.co.uk
2017-06-20 17:13:47 +01:00
Chris Wilson
829a0af29f drm/i915: Group all the global context information together
Create a substruct to hold all the global context state under
drm_i915_private.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk
2017-06-20 17:13:40 +01:00
Lionel Landwerlin
28c7ef9ecc drm/i915/perf: add GLK support
Add OA support for Geminilake (pretty much identical to Broxton), and
also add the associated OA configurations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170613112309.4088-2-lionel.g.landwerlin@intel.com
2017-06-14 12:31:58 -07:00
Lionel Landwerlin
6c5c1d89af drm/i915/perf: add KBL support
Add OA support for Kabylake (pretty much identical to Skylake), and
also add the associated OA configurations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2017-06-14 12:31:58 -07:00
Robert Bragg
1bef3409f1 drm/i915/perf: remove perf.hook_lock
In earlier iterations of the i915-perf driver we had a number of
callbacks/hooks from other parts of the i915 driver to e.g. notify us
when a legacy context was pinned and these could run asynchronously with
respect to the stream file operations and might also run in atomic
context.

dev_priv->perf.hook_lock had been for serialising access to state needed
within these callbacks, but as the code has evolved some of the hooks
have gone away or are implemented to avoid needing to lock any state.

The remaining use of this lock was actually redundant considering how
the gen7 oacontrol state used to be updated as part of a context pin
hook.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2017-06-14 12:31:57 -07:00
Robert Bragg
155e941f49 drm/i915/perf: per-gen timebase for checking sample freq
An oa_exponent_to_ns() utility and per-gen timebase constants where
recently removed when updating the tail pointer race condition WA, and
this restores those so we can update the _PROP_OA_EXPONENT validation
done in read_properties_unlocked() to not assume we have a 12.5MHz
timebase as we did for Haswell.

Accordingly the oa_sample_rate_hard_limit value that's referenced by
proc_dointvec_minmax defining the absolute limit for the OA sampling
frequency is now initialized to (timestamp_frequency / 2) instead of the
6.25MHz constant for Haswell.

v2:
    Specify frequency of 19.2MHz for BXT (Ville)
    Initialize oa_sample_rate_hard_limit per-gen too (Lionel)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2017-06-14 12:31:57 -07:00
Robert Bragg
19f81df285 drm/i915/perf: Add OA unit support for Gen 8+
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

v5: Drain submitted requests when enabling metric set to ensure no
    lite-restore erases the context image we just updated (Lionel)

v6: In addition to drain, switch to kernel context & update all
    context in place (Chris)

v7: Add missing mutex_unlock() if switching to kernel context fails
    (Matthew)

v8: Simplify OA period/flex-eu-counters programming by using the
    batchbuffer instead of modifying ctx-image (Lionel)

v9: Back to updating the context image (due to erroneous testing,
    batchbuffer programming the OA unit doesn't actually work)
    (Lionel)
    Pin context before updating context image (Chris)
    Drop MMIO programming now that we switch to a kernel context with
    right values in initial context image (Chris)

v10: Just pin_map the contexts we want to modify or let the
     configuration happen on first use (Chris)

v11: Update kernel context OA config through the batchbuffer rather
     than on the fly ctx-image update (Lionel)

v12: Rework OA context registers update again by swithing away from
     user contexts and reconfiguring the kernel context through the
     batchbuffer and updating all the other contexts' context image.
     Also take care to lock slice/subslice configuration when OA is
     on. (Lionel)

v13: Request rpcs updates on all engine when updating the OA config
     (Lionel)

v14: Drop any kind of rpcs management now that we monitor sseu
     configuration changes in a later patch (Lionel)
     Remove usleep after programming the NOA configs on Gen8+, this
     doesn't seem to be needed (Lionel)

v15: Respect coding style for block comments (Chris)

v16: Add missing i915_add_request() in case we fail to emit OA
     configuration (Matthew)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2017-06-14 12:31:57 -07:00
Lionel Landwerlin
3f488d9985 drm/i915/perf: rework mux configurations queries
Gen8+ might have mux configurations per slices/subslices. Depending on
whether slices/subslices have been fused off, only part of the
configuration needs to be applied. This change reworks the mux
configurations query mechanism to allow more than one set of registers
to be programmed.

v2: s/n_mux_regs/n_mux_configs/ (Matthew)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2017-06-14 12:31:57 -07:00
Robert Bragg
712122eaa1 drm/i915/perf: rate limit spurious oa report notice
This change is pre-emptively aiming to avoid a potential cause of kernel
logging noise in case some condition were to result in us seeing invalid
OA reports.

The workaround for the OA unit's tail pointer race condition is what
avoids the primary known cause of invalid reports being seen and with
that in place we aren't expecting to see this notice but it can't be
entirely ruled out.

Just in case some condition does lead to the notice then it's likely
that it will be triggered repeatedly while attempting to append a
sequence of reports and depending on the configured OA sampling
frequency that might be a large number of repeat notices.

v2: (Chris) avoid inconsistent warning on throttle with
    printk_ratelimit()
v3: (Matt) init and summarise with stream init/close not driver init/fini

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-9-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-13 11:03:43 +01:00
Robert Bragg
4117ebc74c drm/i915/perf: better pipeline aged/aging tail updates
This updates the tail pointer race workaround handling to updating the
'aged' pointer before looking to start aging a new one. There's the
possibility that there is already new data available and so we can
immediately start aging a new pointer without having to first wait for a
later hrtimer callback (and then another to age).

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-8-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-13 11:03:17 +01:00
Robert Bragg
52c57c263f drm/i915/perf: improve invalid OA format debug message
A minor improvement to debugging output

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-7-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-13 11:02:40 +01:00
Robert Bragg
0dd860cf73 drm/i915/perf: improve tail race workaround
There's a HW race condition between OA unit tail pointer register
updates and writes to memory whereby the tail pointer can sometimes get
ahead of what's been written out to the OA buffer so far (in terms of
what's visible to the CPU).

Although this can be observed explicitly while copying reports to
userspace by checking for a zeroed report-id field in tail reports, we
want to account for this earlier, as part of the _oa_buffer_check to
avoid lots of redundant read() attempts.

Previously the driver used to define an effective tail pointer that
lagged the real pointer by a 'tail margin' measured in bytes derived
from OA_TAIL_MARGIN_NSEC and the configured sampling frequency.
Unfortunately this was flawed considering that the OA unit may also
automatically generate non-periodic reports (such as on context switch)
or the OA unit may be enabled without any periodic sampling.

This improves how we define a tail pointer for reading that lags the
real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which
gives enough time for the corresponding reports to become visible to the
CPU.

The driver now maintains two tail pointers:
 1) An 'aging' tail with an associated timestamp that is tracked until we
    can trust the corresponding data is visible to the CPU; at which point
    it is considered 'aged'.
 2) An 'aged' tail that can be used for read()ing.

The two separate pointers let us decouple read()s from tail pointer aging.

The tail pointers are checked and updated at a limited rate within a
hrtimer callback (the same callback that is used for delivering POLLIN
events) and since we're now measuring the wall clock time elapsed since
a given tail pointer was read the mechanism no longer cares about
the OA unit's periodic sampling frequency.

The natural place to handle the tail pointer updates was in
gen7_oa_buffer_is_empty() which is called as part of blocking reads and
the hrtimer callback used for polling, and so this was renamed to
oa_buffer_check() considering the added side effect while checking
whether the buffer contains data.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-6-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-13 11:01:28 +01:00
Robert Bragg
3bb335c1e7 drm/i915/perf: no head/tail ref in gen7_oa_read
This avoids redundantly passing an (inout) head and tail pointer to
gen7_append_oa_reports() from gen7_oa_read which doesn't need to
reference either itself.

Moving the head/tail reads and writes into gen7_append_oa_reports should
have no functional effect except to avoid some redundant head pointer
writes in cases where nothing was copied to userspace.

This is a stepping stone towards updating how the head and tail pointer
state is managed to improve the workaround for the OA unit's tail
pointer race. It reduces the number of places we need to read/write the
head and tail pointers.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-5-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-13 11:00:43 +01:00
Robert Bragg
f279020a02 drm/i915/perf: avoid read back of head register
There's no need for the driver to keep reading back the head pointer
from hardware since the hardware doesn't update it automatically. This
way we can treat any invalid head pointer value as a software/driver
bug instead of spurious hardware behaviour.

This change is also a small stepping stone towards re-working how
the head and tail state is managed as part of an improved workaround
for the tail register race condition.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-4-lionel.g.landwerlin@intel.com
2017-05-13 10:59:39 +01:00
Robert Bragg
26ebd9c734 drm/i915/perf: avoid poll, read, EAGAIN busy loops
If the function for checking whether there is OA buffer data available
(during a poll or blocking read) has false positives then we want to
avoid a situation where the subsequent read() returns EAGAIN (after
a more accurate check) followed by a poll() immediately reporting
the same false positive POLLIN event and effectively maintaining a
busy loop until there really is data.

This makes sure that we clear the .pollin event status whenever we
return EAGAIN to userspace which will throttle subsequent POLLIN events
and repeated attempts to read to the 5ms intervals of the hrtimer
callback we have.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-3-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-13 10:59:07 +01:00
Robert Bragg
e81b3a555f drm/i915/perf: fix gen7_append_oa_reports comment
If I'm going to complain about a back-to-front convention then the least
I can do is not muddle the comment up too.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-2-lionel.g.landwerlin@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-13 10:58:30 +01:00
Chris Wilson
266a240bf0 drm/i915: Use engine->context_pin() to report the intel_ring
Since unifying ringbuffer/execlist submission to use
engine->pin_context, we ensure that the intel_ring is available before
we start constructing the request. We can therefore move the assignment
of the request->ring to the central i915_gem_request_alloc() and not
require it in every engine->request_alloc() callback. Another small step
towards simplification (of the core, but at a cost of handling error
pointers in less important callers of engine->pin_context).

v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code
generation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk
2017-05-04 11:54:43 +01:00
Matthew Auld
0a309f9e3d drm/i915/perf: remove user triggerable warn
Don't throw a warning if we are given an invalid property id. While
here let's also bring back Robert' original idea of catching unhandled
enumeration values at compile time.

Fixes: eec688e142 ("drm/i915: Add i915 perf infrastructure")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327203236.18276-1-matthew.auld@intel.com
2017-03-28 14:52:43 +03:00
Matthew Auld
22f880ca82 drm/i915/perf: destroy stream on sample_flags mismatch
If we were to ever encounter a sample_flags mismatch we need to ensure
we destroy the stream when we bail.

Fixes: d79651522e ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327203459.18398-1-matthew.auld@intel.com
2017-03-28 10:31:51 +03:00
Chris Wilson
675204153e drm/i915: s/assert_spin_locked/lockdep_assert_held/
assert_spin_locked() becomes an unconditionally compiled BUG_ON(),
adding debug code right into the heart of critical routines like
interrupt handlers.

   text	   data	    bss	    dec	    hex
1296480	  19944	   2272	1318696	 141f28	before (lockdep disabled)
1295984	  19944	   2272	1318200	 141d38	after

1336261	  21139	   3208	1360608	 14c2e0	before (lockdep enabled)
1339920	  21139	   3208	1364267	 14d12b	after

Small saving for release; hopefully more instructive in debug.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302132801.599-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-02 15:18:55 +00:00
Chris Wilson
69df05e11a drm/i915: Simplify releasing context reference
A few users only take the struct_mutex in order to release a reference
to a context. We can expose a kref_put_mutex() wrapper in order to
simplify these users, and optimise taking of the mutex to the final
unref.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-4-chris@chris-wilson.co.uk
2016-12-18 16:18:53 +00:00
Chris Wilson
e8a9c58fcd drm/i915: Unify active context tracking between legacy/execlists/guc
The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.

We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.

We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.

And finally an ulterior motive for unifying context handling was to
prepare for mock requests.

v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
2016-12-18 16:18:50 +00:00
Robert Bragg
16d98b31f8 drm/i915/perf: More documentation hooked to i915.rst
This adds a 'Perf' section to i915.rst with the following sub sections:
- Overview
- Comparison with Core Perf
- i915 Driver Entry Points
- i915 Perf Stream
- i915 Perf Observation Architecture Stream
- All i915 Perf Internals

v2:
    section headers in i915.rst (Daniel Vetter)
    missing symbol docs + other fixups (Matthew Auld)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207214033.3581-1-robert@sixbynine.org
2016-12-09 10:00:45 +01:00
Robert Bragg
7708550ce5 drm/i915/perf: use DRM_DEBUG for userspace issues
Avoid using DRM_ERROR for conditions userspace can trigger with a bad
config when opening a stream or from not reading data in a timely
fashion (whereby the OA buffer fills up). These conditions are tested
by i-g-t which treats error messages as failures if using the test
runner. This wasn't an issue while the i915-perf igt tests were being
run in isolation.

One message relating to seeing a spurious zeroed report was changed to
use DRM_NOTE instead of DRM_ERROR. Ideally this warning shouldn't be
seen, but it's not a serious problem if it is. Considering that the
tail margin mechanism is only a heuristic it's possible we might see
this from time to time.

Signed-off-by: Robert Bragg <robert@sixbynine.org:
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161201172152.10893-1-robert@sixbynine.org
2016-12-07 17:03:43 +01:00
Tvrtko Ursulin
12d79d7828 drm/i915: Make GEM object create and create from data take dev_priv
Makes all GEM object constructors consistent.

v2: Fix compilation in GVT code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
2016-12-01 18:01:08 +00:00
Chris Wilson
2460393583 drm/i915/perf: Wrap 64bit divides in do_div()
Just a couple of naked 64bit divides causing link errors on 32bit
builds, with:

	ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined!

v2: do_div() is only u64/u32, we need a u32/u64!
v3: div_u64() == u64/u32, div64_u64() == u64/u64

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: d79651522e ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Robert Bragg <robert@sixbynine.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161123150714.24449-1-chris@chris-wilson.co.uk
Reviewed-by: Robert Bragg <robert@sixbynine.org>
2016-11-29 11:41:33 +00:00
Robert Bragg
7abbd8d670 drm/i915: Add a kerneldoc summary for i915_perf.c
In particular this tries to capture for posterity some of the early
challenges we had with using the core perf infrastructure in case we
ever want to revisit adapting perf for device metrics.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-12-robert@sixbynine.org
2016-11-22 14:40:34 +01:00
Robert Bragg
00319ba043 drm/i915: add dev.i915.oa_max_sample_rate sysctl
The maximum OA sampling frequency is now configurable via a
dev.i915.oa_max_sample_rate sysctl parameter.

Following the precedent set by perf's similar
kernel.perf_event_max_sample_rate the default maximum rate is 100000Hz

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-10-robert@sixbynine.org
2016-11-22 14:39:27 +01:00
Robert Bragg
ccdf6341ed drm/i915: Add dev.i915.perf_stream_paranoid sysctl option
Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-9-robert@sixbynine.org
2016-11-22 14:39:00 +01:00
Robert Bragg
442b8c06fc drm/i915: advertise available metrics via sysfs
Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics/<guid>/id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The <guid> is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.

The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:

 https://github.com/rib/gputop
 > gputop-data/guids.xml
 > scripts/update-guids.py
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-8-robert@sixbynine.org
2016-11-22 14:38:51 +01:00
Robert Bragg
d79651522e drm/i915: Enable i915 perf stream for Haswell OA unit
Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

v2:
   Make sure to initialize ->specific_ctx_id when opening, without
   relying on _pin_notify hook, in case ctx already pinned.
v3:
   Revert back to pinning ctx upfront when opening stream, removing
   need to hook in to pinning and to update OACONTROL on the fly.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-7-robert@sixbynine.org
2016-11-22 14:38:13 +01:00
Robert Bragg
eec688e142 drm/i915: Add i915 perf infrastructure
Adds base i915 perf infrastructure for Gen performance metrics.

This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
properties to configure a stream of metrics and returns a new fd usable
with standard VFS system calls including read() to read typed and sized
records; ioctl() to enable or disable capture and poll() to wait for
data.

A stream is opened something like:

  uint64_t properties[] = {
      /* Single context sampling */
      DRM_I915_PERF_PROP_CTX_HANDLE,        ctx_handle,

      /* Include OA reports in samples */
      DRM_I915_PERF_PROP_SAMPLE_OA,         true,

      /* OA unit configuration */
      DRM_I915_PERF_PROP_OA_METRICS_SET,    metrics_set_id,
      DRM_I915_PERF_PROP_OA_FORMAT,         report_format,
      DRM_I915_PERF_PROP_OA_EXPONENT,       period_exponent,
   };
   struct drm_i915_perf_open_param parm = {
      .flags = I915_PERF_FLAG_FD_CLOEXEC |
               I915_PERF_FLAG_FD_NONBLOCK |
               I915_PERF_FLAG_DISABLED,
      .properties_ptr = (uint64_t)properties,
      .num_properties = sizeof(properties) / 16,
   };
   int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param);

Records read all start with a common { type, size } header with
DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
contain an extensible number of fields and it's the
DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
determine what's included in every sample.

No specific streams are supported yet so any attempt to open a stream
will return an error.

v2:
    use i915_gem_context_get() - Chris Wilson
v3:
    update read() interface to avoid passing state struct - Chris Wilson
    fix some rebase fallout, with i915-perf init/deinit
v4:
    s/DRM_IORW/DRM_IOW/ - Emil Velikov

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-2-robert@sixbynine.org
2016-11-22 14:27:18 +01:00