linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-16 18:36:17 +07:00

Author	SHA1	Message	Date
Felix Kuehling	d950800e79	drm/amdgpu: Fix KFD-related kernel oops on Hawaii Hawaii needs to flush caches explicitly, submitting an IB in a user VMID from kernel mode. There is no s_fence in this case. Fixes: `eb3961a574` ("drm/amdgpu: remove fence context from the job") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:10:06 -05:00
Andrey Grodzovsky	708901a666	drm/amdgpu: Fix mutex lock from atomic context. Problem: amdgpu_ras_reserve_bad_pages was moved to amdgpu_ras_reset_gpu because writing to EEPROM during ASIC reset was unstable. But for ERREVENT_ATHUB_INTERRUPT amdgpu_ras_reset_gpu is called directly from ISR context and so locking is not allowed. Also it's irrelevant for this partilcular interrupt as this is generic RAS interrupt and not memory errors specific. Fix: Avoid calling amdgpu_ras_reserve_bad_pages if not in task context. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:09:59 -05:00
Jiange Zhao	3636169cc0	drm/amdgpu: Add SRIOV mailbox backend for Navi1x Mimic the ones for Vega10, add mailbox backend for Navi1x Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:09:52 -05:00
Guchun Chen	1a3f2e8c3c	drm/amdgpu: implement ras query function for pcie bif ras error query funtionality implementation Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:09:44 -05:00
Guchun Chen	d7b1ed4ac3	drm/amdgpu: add pcie bif ras related registers These registers will be accessed for querying ras errors. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:09:37 -05:00
Guchun Chen	d7bd680d40	drm/amdgpu: support pcie bif ras query and inject Call pcie bif ras query/inject in amdgpu ras. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:09:30 -05:00
Guchun Chen	52652ef286	drm/amdgpu: add ras error query count interface for nbio Add the interface query_ras_error_count for nbio. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:09:21 -05:00
Tianci.Yin	ff9d097193	drm/amdgpu: fix CPDMA hang in PRT mode for VEGA10 add and_mask since the programming logic of golden setting changed Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Tianci.Yin <tianci.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:09:14 -05:00
Hawking Zhang	f317035288	drm/amdgpu: enable error injection to XGMI block via debugfs allow inject error to XGMI block via debugfs node ras_ctrl Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:09:08 -05:00
Hawking Zhang	029fbd437e	drm/amdgpu: initialize ras structures for xgmi block (v2) init ras common interface and fs node for xgmi block v2: remove unnecessary physical node number check before invoking amdgpu_xgmi_ras_late_init Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:08:51 -05:00
Huang Rui	acb9acbefe	drm/amdkfd: fix the missed asic name while inited renoir_device_info This patch fixes null pointer issue below, I missed to init the asic renior name while I rebase the patches. [ 106.004250] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 106.004254] #PF: supervisor read access in kernel mode [ 106.004256] #PF: error_code(0x0000) - not-present page [ 106.004257] PGD 0 P4D 0 [ 106.004261] Oops: 0000 [#1] SMP NOPTI [ 106.004264] CPU: 3 PID: 1422 Comm: modprobe Not tainted 5.2.0-rc1-custom #1 [ 106.004266] Hardware name: AMD Celadon-RN/Celadon-RN, BIOS WCD9814N_Weekly_19_08_1 08/14/2019 [ 106.004272] RIP: 0010:strncpy+0x12/0x30 [ 106.004274] Code: c1 c0 11 48 c1 c6 15 48 31 d0 48 c1 c2 20 31 c2 89 d0 31 f0 41 5c 5d c3 55 48 85 d2 48 89 f8 48 89 e5 74 1e 48 01 fa 48 89 f9 <44> 0f b6 06 41 80 f8 01 44 88 01 48 83 de ff 48 83 c1 01 48 39 d1 [ 106.004278] RSP: 0018:ffffc092c1fd37a8 EFLAGS: 00010286 [ 106.004281] RAX: ffff9e943466a28c RBX: 00000000000036ed RCX: ffff9e943466a28c [ 106.004283] RDX: ffff9e943466a2ac RSI: 0000000000000000 RDI: ffff9e943466a28c [ 106.004285] RBP: ffffc092c1fd37a8 R08: ffff9e943d100000 R09: 0000000000000228 [ 106.004287] R10: ffff9e94418dc5a8 R11: ffff9e944746c0d0 R12: 0000000000000000 [ 106.004289] R13: ffff9e943fa1ec00 R14: ffff9e943466a200 R15: ffff9e943466a200 [ 106.004291] FS: 00007f7a022c5540(0000) GS:ffff9e9447ac0000(0000) knlGS:0000000000000000 [ 106.004294] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.004296] CR2: 0000000000000000 CR3: 00000001ff0b0000 CR4: 0000000000340ee0 [ 106.004298] Call Trace: [ 106.004382] kfd_topology_add_device+0x150/0x610 [amdgpu] [ 106.004445] kgd2kfd_device_init+0x2e0/0x4f0 [amdgpu] [ 106.004509] amdgpu_amdkfd_device_init+0x14c/0x1b0 [amdgpu] Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-and-Tested-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:06:54 -05:00
Bhawanpreet Lakha	d1082e23ee	drm/amd/display: Implement voltage limitation for dali [Why] we only want the lowest voltage to be available for dali. [How] Use the get_highest_allowed_voltage_level function to return 0 for dali Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:06:48 -05:00
Bhawanpreet Lakha	c4cacce785	drm/amd/display: add Asic ID for Dali Dali is a new asic revision based on raven2 Add the ID and ASICREV_IS_DALI define Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:06:41 -05:00
Andrey Grodzovsky	084fe13b2c	drm/amdgpu: Allow to reset to EERPOM table. The table grows quickly during debug/development effort when multiple RAS errors are injected. Allow to avoid this by setting table header back to empty if needed. v2: Switch to debugfs entry instead of load time parameter. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:06:34 -05:00
Andrey Grodzovsky	d01b400b1a	drm/amdgpu: Add amdgpu_ras_eeprom_reset_table This will allow to reset the table on the fly. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:06:27 -05:00
Tao Zhou	d99659a062	drm/amdgpu: rename umc ras_init to err_cnt_init this interface is related to specific version of umc, distinguish it from ras_late_init Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:06:20 -05:00
Tao Zhou	4930aabe7c	drm/amdgpu: move umc ras init to umc block move umc ras init from ras module to umc block, generic ras module should pay less attention to specific ras block. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:06:12 -05:00
Tao Zhou	86edcc7dba	drm/amdgpu: move umc late init from gmc to umc block umc late init is umc specific, it's more suitable to be put in umc block Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:06:03 -05:00
Guchun Chen	1bd252c57b	drm/amdgpu: remove duplicated header file include amdgpu_ras.h is already included. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:05:29 -05:00
Shirish S	a35ad98bf9	drm/amdgpu: remove needless usage of #ifdef define sched_policy in case CONFIG_HSA_AMD is not enabled, with this there is no need to check for CONFIG_HSA_AMD else where in driver code. Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shirish S <shirish.s@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:05:20 -05:00
Shirish S	8c9f69bc5c	drm/amdgpu: fix build error without CONFIG_HSA_AMD If CONFIG_HSA_AMD is not set, build fails: drivers/gpu/drm/amd/amdgpu/amdgpu_device.o: In function `amdgpu_device_ip_early_init': drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1626: undefined reference to `sched_policy' Use CONFIG_HSA_AMD to guard this. Fixes: 1abb680ad371 ("drm/amdgpu: disable gfxoff while use no H/W scheduling policy") Signed-off-by: Shirish S <shirish.s@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:05:07 -05:00
Evan Quan	38750f0303	drm/amd/powerplay: update smu11_driver_if_arcturus.h Also bump the SMU11_DRIVER_IF_VERSION_ARCT. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:05:00 -05:00
Evan Quan	04c572a0df	drm/amd/powerplay: issue DC-BTC for arcturus on SMU init Need to perform DC-BTC for arcturus on bootup. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 10:04:35 -05:00
Andrey Grodzovsky	4d1337d2e9	drm/amdgpu: Avoid RAS recovery init when no RAS support. Fixes driver load regression on APUs. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:59:35 -05:00
Christian König	cbfae36cea	drm/amdgpu: cleanup PTE flag generation v3 Move the ASIC specific code into a new callback function. v2: mask the flags for SI and CIK instead of a BUG_ON(). v3: remove last missed BUG_ON(). Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:59:29 -05:00
Christian König	71776b6dae	drm/amdgpu: cleanup mtype mapping Unify how we map the UAPI flags to the PTE hardware flags for a mapping. Only the MTYPE is actually ASIC dependent, all other flags should be copied over 1 to 1 and ASIC differences are handled later on. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:59:21 -05:00
Tianci.Yin	1dd077bbba	drm/amdgpu: add navi14 PCI ID for work station SKU Add the navi14 PCI device id. Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Tianci.Yin <tianci.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:59:14 -05:00
Prike Liang	75a8957f80	drm/amd/powerplay: Add the interface for geting dpm current power state implement the sysfs power_dpm_state Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:59:07 -05:00
Philip Yang	cde85ac247	drm/amdgpu: check if nbio->ras_if exist To avoid NULL function pointer access. This happens on VG10, reboot command hangs and have to power off/on to reboot the machine. This is serial console log: [ OK ] Reached target Unmount All Filesystems. [ OK ] Reached target Final Step. Starting Reboot... [ 305.696271] systemd-shutdown[1]: Syncing filesystems and block devices. [ 306.947328] systemd-shutdown[1]: Sending SIGTERM to remaining processes... [ 306.963920] systemd-journald[1722]: Received SIGTERM from PID 1 (systemd-shutdow). [ 307.322717] systemd-shutdown[1]: Sending SIGKILL to remaining processes... [ 307.336472] systemd-shutdown[1]: Unmounting file systems. [ 307.454202] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro [ 307.480523] systemd-shutdown[1]: All filesystems unmounted. [ 307.486537] systemd-shutdown[1]: Deactivating swaps. [ 307.491962] systemd-shutdown[1]: All swaps deactivated. [ 307.497624] systemd-shutdown[1]: Detaching loop devices. [ 307.504418] systemd-shutdown[1]: All loop devices detached. [ 307.510418] systemd-shutdown[1]: Detaching DM devices. [ 307.565907] sd 2:0:0:0: [sda] Synchronizing SCSI cache [ 307.731313] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 307.738802] #PF: supervisor read access in kernel mode [ 307.744326] #PF: error_code(0x0000) - not-present page [ 307.749850] PGD 0 P4D 0 [ 307.752568] Oops: 0000 [#1] SMP PTI [ 307.756314] CPU: 3 PID: 1 Comm: systemd-shutdow Not tainted 5.2.0-rc1-kfd-yangp #453 [ 307.764644] Hardware name: ASUS All Series/Z97-PRO(Wi-Fi ac)/USB 3.1, BIOS 9001 03/07/2016 [ 307.773580] RIP: 0010:soc15_common_hw_fini+0x33/0xc0 [amdgpu] [ 307.779760] Code: 89 fb e8 60 f5 ff ff f6 83 50 df 01 00 04 75 3d 48 8b b3 90 7d 00 00 48 c7 c7 17 b8 530 [ 307.799967] RSP: 0018:ffffac9483153d40 EFLAGS: 00010286 [ 307.805585] RAX: 0000000000000000 RBX: ffff9eb299da0000 RCX: 0000000000000006 [ 307.813261] RDX: 0000000000000000 RSI: ffff9eb29e3508a0 RDI: ffff9eb29e350000 [ 307.820935] RBP: ffff9eb299da0000 R08: 0000000000000000 R09: 0000000000000000 [ 307.828609] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9eb299dbd1f8 [ 307.836284] R13: ffffffffc04f8368 R14: ffff9eb29cebd130 R15: 0000000000000000 [ 307.843959] FS: 00007f06721c9940(0000) GS:ffff9eb2a18c0000(0000) knlGS:0000000000000000 [ 307.852663] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 307.858842] CR2: 0000000000000000 CR3: 000000081d798005 CR4: 00000000001606e0 [ 307.866516] Call Trace: [ 307.869169] amdgpu_device_ip_suspend_phase2+0x80/0x110 [amdgpu] [ 307.875654] ? amdgpu_device_ip_suspend_phase1+0x4d/0xd0 [amdgpu] [ 307.882230] amdgpu_device_ip_suspend+0x2e/0x60 [amdgpu] [ 307.887966] amdgpu_pci_shutdown+0x2f/0x40 [amdgpu] [ 307.893211] pci_device_shutdown+0x31/0x60 [ 307.897613] device_shutdown+0x14c/0x1f0 [ 307.901829] kernel_restart+0xe/0x50 [ 307.905669] __do_sys_reboot+0x1df/0x210 [ 307.909884] ? task_work_run+0x73/0xb0 [ 307.913914] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 307.918970] do_syscall_64+0x4a/0x1c0 [ 307.922904] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 307.928336] RIP: 0033:0x7f0671cf8373 [ 307.932176] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 fa be 69 19 128 [ 307.952384] RSP: 002b:00007ffdd1723d68 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9 [ 307.960527] RAX: ffffffffffffffda RBX: 0000000001234567 RCX: 00007f0671cf8373 [ 307.968201] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead [ 307.975875] RBP: 00007ffdd1723dd0 R08: 0000000000000000 R09: 0000000000000000 [ 307.983550] R10: 0000000000000002 R11: 0000000000000202 R12: 00007ffdd1723dd8 [ 307.991224] R13: 0000000000000000 R14: 0000001b00000004 R15: 00007ffdd17240c8 [ 307.998901] Modules linked in: xt_MASQUERADE nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat nf_cos [ 308.026505] CR2: 0000000000000000 [ 308.039998] RIP: 0010:soc15_common_hw_fini+0x33/0xc0 [amdgpu] [ 308.046180] Code: 89 fb e8 60 f5 ff ff f6 83 50 df 01 00 04 75 3d 48 8b b3 90 7d 00 00 48 c7 c7 17 b8 530 [ 308.066392] RSP: 0018:ffffac9483153d40 EFLAGS: 00010286 [ 308.072013] RAX: 0000000000000000 RBX: ffff9eb299da0000 RCX: 0000000000000006 [ 308.079689] RDX: 0000000000000000 RSI: ffff9eb29e3508a0 RDI: ffff9eb29e350000 [ 308.087366] RBP: ffff9eb299da0000 R08: 0000000000000000 R09: 0000000000000000 [ 308.095042] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9eb299dbd1f8 [ 308.102717] R13: ffffffffc04f8368 R14: ffff9eb29cebd130 R15: 0000000000000000 [ 308.110394] FS: 00007f06721c9940(0000) GS:ffff9eb2a18c0000(0000) knlGS:0000000000000000 [ 308.119099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 308.125280] CR2: 0000000000000000 CR3: 000000081d798005 CR4: 00000000001606e0 [ 308.135304] printk: systemd-shutdow: 3 output lines suppressed due to ratelimiting [ 308.143518] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 [ 308.151798] Kernel Offset: 0x15000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xff) [ 308.171775] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]--- Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:58:52 -05:00
Xiaojie Yuan	bfa603aa5e	drm/amdgpu: fix null pointer deref in firmware header printing v2: declare as (struct common_firmware_header *) type because struct xxx_firmware_header inherits from it When CE's ucode_id(8) is used to get sdma_hdr, we will be accessing an unallocated amdgpu_firmware_info instance. This issue appears on rhel7.7 with gcc 4.8.5. Newer compilers might have optimized out such 'defined but not referenced' variable. [ 1120.798564] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a [ 1120.806703] IP: [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu] [ 1120.813693] PGD 80000002603ff067 PUD 271b8d067 PMD 0 [ 1120.818931] Oops: 0000 [#1] SMP [ 1120.822245] Modules linked in: amdgpu(OE+) amdkcl(OE) amd_iommu_v2 amdttm(OE) amd_sched(OE) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun bridge stp llc devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dm_mirror dm_region_hash dm_log dm_mod intel_pmc_core intel_powerclamp coretemp intel_rapl joydev kvm_intel eeepc_wmi asus_wmi kvm sparse_keymap iTCO_wdt irqbypass rfkill crc32_pclmul snd_hda_codec_realtek mxm_wmi ghash_clmulni_intel intel_wmi_thunderbolt iTCO_vendor_support snd_hda_codec_generic snd_hda_codec_hdmi aesni_intel lrw gf128mul glue_helper ablk_helper sg cryptd pcspkr snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd pinctrl_sunrisepoint pinctrl_intel soundcore acpi_pad mei_me wmi mei i2c_i801 pcc_cpufreq ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic i915 i2c_algo_bit iosf_mbi drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm ptp libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw pps_core drm_panel_orientation_quirks video i2c_hid [ 1120.954136] CPU: 4 PID: 2426 Comm: modprobe Tainted: G OE ------------ 3.10.0-1062.el7.x86_64 #1 [ 1120.964390] Hardware name: System manufacturer System Product Name/Z170-A, BIOS 1302 11/09/2015 [ 1120.973321] task: ffff991ef1e3c1c0 ti: ffff991ee625c000 task.ti: ffff991ee625c000 [ 1120.981020] RIP: 0010:[<ffffffffc0e3c9b3>] [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu] [ 1120.990483] RSP: 0018:ffff991ee625f950 EFLAGS: 00010202 [ 1120.995935] RAX: 0000000000000002 RBX: ffff991edf6b2d38 RCX: ffff991edf6a0000 [ 1121.003391] RDX: 0000000000000000 RSI: ffff991f01d13898 RDI: ffffffffc110afb3 [ 1121.010706] RBP: ffff991ee625f9b0 R08: 0000000000000000 R09: 0000000000000000 [ 1121.018029] R10: 00000000000004c4 R11: ffff991ee625f64e R12: ffff991edf6b3220 [ 1121.025353] R13: ffff991edf6a0000 R14: 0000000000000008 R15: ffff991edf6b2d30 [ 1121.032666] FS: 00007f97b0c0b740(0000) GS:ffff991f01d00000(0000) knlGS:0000000000000000 [ 1121.041000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1121.046880] CR2: 000000000000000a CR3: 000000025e604000 CR4: 00000000003607e0 [ 1121.054239] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1121.061631] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1121.068938] Call Trace: [ 1121.071494] [<ffffffffc0e3dba8>] psp_hw_init+0x218/0x270 [amdgpu] [ 1121.077886] [<ffffffffc0da3188>] amdgpu_device_fw_loading+0xe8/0x160 [amdgpu] [ 1121.085296] [<ffffffffc0e3b34c>] ? vega10_ih_irq_init+0x4bc/0x730 [amdgpu] [ 1121.092534] [<ffffffffc0da5c75>] amdgpu_device_init+0x1495/0x1c90 [amdgpu] [ 1121.099675] [<ffffffffc0da9cab>] amdgpu_driver_load_kms+0x8b/0x2f0 [amdgpu] [ 1121.106888] [<ffffffffc01b25cf>] drm_dev_register+0x12f/0x1d0 [drm] [ 1121.113419] [<ffffffffa4dcdfd8>] ? pci_enable_device_flags+0xe8/0x140 [ 1121.120183] [<ffffffffc0da260a>] amdgpu_pci_probe+0xca/0x170 [amdgpu] [ 1121.126919] [<ffffffffa4dcf97a>] local_pci_probe+0x4a/0xb0 [ 1121.132622] [<ffffffffa4dd10c9>] pci_device_probe+0x109/0x160 [ 1121.138607] [<ffffffffa4eb4205>] driver_probe_device+0xc5/0x3e0 [ 1121.144766] [<ffffffffa4eb4603>] __driver_attach+0x93/0xa0 [ 1121.150507] [<ffffffffa4eb4570>] ? __device_attach+0x50/0x50 [ 1121.156422] [<ffffffffa4eb1da5>] bus_for_each_dev+0x75/0xc0 [ 1121.162213] [<ffffffffa4eb3b7e>] driver_attach+0x1e/0x20 [ 1121.167771] [<ffffffffa4eb3620>] bus_add_driver+0x200/0x2d0 [ 1121.173590] [<ffffffffa4eb4c94>] driver_register+0x64/0xf0 [ 1121.179345] [<ffffffffa4dd0905>] __pci_register_driver+0xa5/0xc0 [ 1121.185593] [<ffffffffc099f000>] ? 0xffffffffc099efff [ 1121.190914] [<ffffffffc099f0a4>] amdgpu_init+0xa4/0xb0 [amdgpu] [ 1121.197101] [<ffffffffa4a0210a>] do_one_initcall+0xba/0x240 [ 1121.202901] [<ffffffffa4b1c90a>] load_module+0x271a/0x2bb0 [ 1121.208598] [<ffffffffa4dad740>] ? ddebug_proc_write+0x100/0x100 [ 1121.214894] [<ffffffffa4b1ce8f>] SyS_init_module+0xef/0x140 [ 1121.220698] [<ffffffffa518bede>] system_call_fastpath+0x25/0x2a [ 1121.226870] Code: b4 01 60 a2 00 00 31 c0 e8 83 60 33 e4 41 8b 47 08 48 8b 4d d0 48 c7 c7 b3 af 10 c1 48 69 c0 68 07 00 00 48 8b 84 01 60 a2 00 00 <48> 8b 70 08 31 c0 48 89 75 c8 e8 56 60 33 e4 48 8b 4d d0 48 c7 [ 1121.247422] RIP [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu] [ 1121.254432] RSP <ffff991ee625f950> [ 1121.258017] CR2: 000000000000000a [ 1121.261427] ---[ end trace e98b35387ede75bd ]--- Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Fixes: `c5fb912653` ("drm/amdgpu: add firmware header printing for psp fw loading (v2)") Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:56:01 -05:00
Huang Rui	4042a18872	drm/amdkfd: enable renoir while device probes This patch is to add asic flag to enable device probe during kfd init. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:55:50 -05:00
Huang Rui	aa978594cf	drm/amdgpu: disable gfxoff while use no H/W scheduling policy While gfxoff is enabled, the mmVM_XXX registers will be 0xfffffff while the GFX is in "off" state. KFD queue creattion doesn't use ring based method, so it will trigger a VM fault. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:55:41 -05:00
Huang Rui	f5d843d4ea	drm/amdkfd: add renoir kfd topology This patch adds renoir kfd topology which is the same with Raven. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:55:35 -05:00
Huang Rui	444d4f5fd3	drm/amdkfd: add package manager for renoir Renoir use GFX v9, so adds v9 package manager. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:55:29 -05:00
Huang Rui	59a6fc1aef	drm/amdkfd: init kernel queue for renoir Renoir is GFX v9, so init v9 kernel queue. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:55:21 -05:00
Huang Rui	4d85488cd9	drm/amdkfd: init kfd apertures v9 for renoir Renoir is GMC v9, so init v9 kfd apertures. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:55:14 -05:00
Huang Rui	514e5e7e60	drm/amdkfd: add renoir type for the workaround of iommu v2 (v2) Renoir is the same with Raven, will enable iommu event in future. v2: fix the checking (Thong) Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:55:08 -05:00
Huang Rui	5a959a8988	drm/amdkfd: enable kfd device queue manager v9 for renoir Renoir is GFX9, so enable v9 devcie queue manager. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:55:02 -05:00
Huang Rui	2b9c221119	drm/amdkfd: add renoir kfd device info (v2) This patch inits renoir kfd device info, so we treat renoir as "dgpu" (bypass iommu v2). Will enable needs_iommu_device till renoir iommu is ready. v2: rebase and align the drm-next Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:54:55 -05:00
Huang Rui	a8d42f174d	drm/amdkfd: add renoir cache info for CRAT (v2) Renoir's cache info should be the same with raven and carrizo's. v2: fix missed "break" Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:54:49 -05:00
Yong Zhao	8099ae40d8	drm/amdkfd: Support Navi14 in KFD Initial support of Navi14 in KFD. The device IDs will be added later. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:54:40 -05:00
Felix Kuehling	7cae706193	drm/amdgpu: Disable retry faults in VMID0 There is no point retrying page faults in VMID0. Those faults are always fatal. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-and-Tested-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:54:34 -05:00
Yong Zhao	4e66d7d215	drm/amdgpu: Add a kernel parameter for specifying the asic type As more and more new asics start to reuse the old device IDs before launch, there is a need to quickly override the existing asic type corresponding to the reused device ID through a kernel parameter. With this, engineers no longer need to rely on local hack patches, facilitating cooperation across teams. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-16 09:54:25 -05:00
Alex Deucher	bb42eda284	drm/amdgpu/irq: check if nbio funcs exist We need to check if the nbios funcs exist before checking the individual pointers. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>	2019-09-16 09:54:14 -05:00
Qingqing Zhuo	dabeea6427	drm/amd/display: replace FIXME with TODO Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 18:03:34 -05:00
Jing Zhou	b131932215	drm/amd/display: verify stream link before link test [Why] DP1.2 LL CTS test failure. [How] The failure is caused by not verify stream link is equal to link, only check stream and link is not null. Signed-off-by: Jing Zhou <Jing.Zhou@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 18:03:28 -05:00
Charlene Liu	d6bbece2c4	drm/amd/display: dce11.x /dce12 update formula input [Description] 1. OUTSTANDING_REQUEST_LIMIT update from 0xFF to 0x1F (HW doc update) 2. using memory type to convert UMC's MCLK to Yclk. Signed-off-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 18:03:21 -05:00
Bayan Zabihiyan	0417df1699	drm/amd/display: Isolate DSC module from driver dependencies [Why] Edid Utility wishes to include DSC module from driver instead of doing it's own logic which will need to be updated every time someone modifies the driver logic. [How] Modify some functions such that we dont need to pass the entire DC structure as parameter. -Remove DC inclusion from module. -Filter out problematic types and inclusions Signed-off-by: Bayan Zabihiyan <bayan.zabihiyan@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 18:03:13 -05:00
Jaehyun Chung	785908cf19	drm/amd/display: OTC underflow fix [Why] Underflow occurs on some display setups(repro'd on 3x4K HDR) on boot, mode set, and hot-plugs with. Underflow occurs because mem clk is not set high after disabling pstate switching. This behaviour occurs because some calculations assumed displays were synchronized. [How] Add a condition to check if timing sync is disabled so that synchronized vblank can be set to false. Signed-off-by: Jaehyun Chung <jaehyun.chung@amd.com> Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 18:03:06 -05:00
Jun Lei	119630061e	drm/amd/display: remove hw access from dc_destroy [why] dc_destroy should only clean up SW, this is because GPUs may be removed before driver unload, leading to HW to be unavailable. [how] remove GPIO close as part of GPIO destroy, this is unnecessary because GPIO is not shared, and GPIOs are generally closed after being opened Add tracking to HW access during destructor to make future issues easier to pinpoint, and block access to prevent hangs. Signed-off-by: Jun Lei <Jun.Lei@amd.com> Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-09-13 18:02:52 -05:00

1 2 3 4 5 ...

11899 Commits