linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-16 13:46:44 +07:00

Author	SHA1	Message	Date
Benjamin Herrenschmidt	9a1a70ae15	powerpc/pci: Don't try to allocate resources that will be reassigned When we know we will reassign all resources, trying (and failing) to allocate them initially is fairly pointless and leads to a lot of scary messages in the kernel log Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:49 +10:00
Benjamin Herrenschmidt	08a45b320a	powerpc/powernv/pci: Check status of a PHB before using it If the firmware encounters an error (internal or HW) during initialization of a PHB, it might leave the device-node in the tree but mark it disabled using the "status" property. We should check it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:49 +10:00
Benjamin Herrenschmidt	a1339faf72	powerpc/powernv/pci: Use the device-tree to get available range of M64's M64's are the configurable 64-bit windows that cover the 64-bit MMIO space. We used to hard code 16 windows. Newer chips might have a variable number and might need to reserve some as well (for example on PHB4/POWER9, M32 and M64 are actually unified and we use M64#0 to map the 32-bit space). So newer OPALs will provide a property we can use to know what range of windows is available. The property is named so that it can eventually support multiple ranges but we only use the first one for now. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:48 +10:00
Benjamin Herrenschmidt	f0228c4130	powerpc/powernv/pci: Fallback to OPAL for TCE invalidations If we don't find registers for the PHB or don't know the model specific invalidation method, use OPAL calls instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:48 +10:00
Benjamin Herrenschmidt	fd141d1a99	powerpc/powernv/pci: Rework accessing the TCE invalidate register It's architected, always in a known place, so there is no need to keep a separate pointer to it, we use the existing "regs", and we complement it with a real mode variant. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> # Conflicts: # arch/powerpc/platforms/powernv/pci-ioda.c # arch/powerpc/platforms/powernv/pci.h Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:47 +10:00
Benjamin Herrenschmidt	08acce1cab	powerpc/powernv/pci: Remove SWINV constants and obsolete TCE code We have some obsolete code in pnv_pci_p7ioc_tce_invalidate() to handle some internal lab tools that have stopped being useful a long time ago. Remove that along with the definition and test for the TCE_PCI_SWINV_* flags whose value is basically always the same. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:47 +10:00
Benjamin Herrenschmidt	a34ab7c328	powerpc/powernv/pci: Rename TCE invalidation calls The TCE invalidation functions are fairly implementation specific, and while the IODA specs more/less describe the register, in practice various implementation workarounds may be required. So name the functions after the target PHB. Note today and for the foreseeable future, there's a 1:1 relationship between an IODA version and a PHB implementation. There exist another variant of IODA1 (Torrent) but we never supported in with OPAL and never will. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:46 +10:00
Benjamin Herrenschmidt	69c592ed40	powerpc/opal: Add real mode call wrappers Replace the old generic opal_call_realmode() with proper per-call wrappers similar to the normal ones and convert callers. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:46 +10:00
Benjamin Herrenschmidt	b7d6bf4fdd	powerpc/pseries/pci: Remove obsolete SW invalidate That was used by some old IBM internal bringup tools and is no longer relevant. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:45 +10:00
Benjamin Herrenschmidt	fb111334e4	powerpc/powernv: Discover IODA3 PHBs We instanciate them as IODA2. We also change the MSI EOI hack to only kick on PHB3 since it will not be needed on any new implementation. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:45 +10:00
Benjamin Herrenschmidt	d74361881f	powerpc/xics: Add ICP OPAL backend This adds a new XICS backend that uses OPAL calls, which can be used when we don't have native support for the platform interrupt controller. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:45 +10:00
Benjamin Herrenschmidt	1d607bb3bd	powerpc/irq: Add mechanism to force a replay of interrupts Calling this function with interrupts soft-disabled will cause a replay of the external interrupt vector when they are re-enabled. This will be used by the OPAL XICS backend (and latter by the native XIVE code) to handle EOI signaling that there are more interrupts to fetch from the hardware since the hardware won't issue another HW interrupt in that case. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:44 +10:00
Benjamin Herrenschmidt	9baaef0a22	powerpc/irq: Add support for HV virtualization interrupts This will be delivering external interrupts from the XIVE to the Hypervisor. We treat it as a normal external interrupt for the lazy irq disable code (so it will be replayed as a 0x500) and route it to do_IRQ. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:44 +10:00
Benjamin Herrenschmidt	b88d4bce2b	powerpc/book64s: Move a few exception common handlers to make room This moves the CBE RAS and facility unavailable "common" handlers down to after the FWNMI page. This frees up some space in the very demanded spaces before the relocation-on vectors and before the FWNMI page. They are still within 64K of __start, so CONFIG_RELOCATABLE should still work. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-17 16:42:34 +10:00
Benjamin Herrenschmidt	9fedd3f880	powerpc/powernv: Add XICS emulation APIs OPAL provides an emulated XICS interrupt controller to use as a fallback on newer processors that don't have a XICS. It's meant as a way to provide backward compatibility with future processors. Add the corresponding interfaces. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:43 +10:00
Shreyas B. Prabhu	c0691f9dd2	powerpc/powernv: Use deepest stop state when cpu is offlined If hardware supports stop state, use the deepest stop state when the cpu is offlined. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:42 +10:00
Shreyas B. Prabhu	3005c597ba	cpuidle/powernv: Add support for POWER ISA v3 idle states POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. b) new per thread SPR named PSSCR is added which controls the behavior of stop instruction. Supported idle states and value to be written to PSSCR register to enter any idle state is exposed via ibm,cpu-idle-state-names and ibm,cpu-idle-state-psscr respectively. To enter an idle state, platform provided power_stop() needs to be invoked with the appropriate PSSCR value. This patch adds support for this new mechanism in cpuidle powernv driver. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com> Cc: linux-pm@vger.kernel.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:42 +10:00
Shreyas B. Prabhu	957efcedee	cpuidle/powernv: cleanup cpuidle-powernv.c - Use stack instead of kzalloc'ed memory for variables while probing device tree for idle states. - Set cap for number of idle states that can be added to cpuidle_state_table - Minor change in way we check of_property_read_u32_array for error for sake of consistency - Drop unnecessary "&" while assigning function pointer Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-pm@vger.kernel.org Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:42 +10:00
Shreyas B. Prabhu	169f3fae0c	cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES Use cpuidle's CPUIDLE_STATE_MAX macro instead of powernv specific MAX_POWERNV_IDLE_STATES. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-pm@vger.kernel.org Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:41 +10:00
Shreyas B. Prabhu	bcef83a00d	powerpc/powernv: Add platform support for stop instruction POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. This instruction replaces instructions like nap, sleep, rvwinkle. b) new per thread SPR named Processor Stop Status and Control Register (PSSCR) is added which controls the behavior of stop instruction. PSSCR layout: ---------------------------------------------------------- \| PLS \| /// \| SD \| ESL \| EC \| PSLL \| /// \| TR \| MTL \| RL \| ---------------------------------------------------------- 0 4 41 42 43 44 48 54 56 60 PSSCR key fields: Bits 0:3 - Power-Saving Level Status. This field indicates the lowest power-saving state the thread entered since stop instruction was last executed. Bit 42 - Enable State Loss 0 - No state is lost irrespective of other fields 1 - Allows state loss Bits 44:47 - Power-Saving Level Limit This limits the power-saving level that can be entered into. Bits 60:63 - Requested Level Used to specify which power-saving level must be entered on executing stop instruction This patch adds support for stop instruction and PSSCR handling. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:41 +10:00
Shreyas B. Prabhu	0dfffb48ce	powerpc/powernv: abstraction for saving SPRs before entering deep idle states Create a function for saving SPRs before entering deep idle states. This function can be reused for POWER9 deep idle states. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:40 +10:00
Shreyas B. Prabhu	4eae2c9ae5	powerpc/powernv: Make pnv_powersave_common more generic pnv_powersave_common does common steps needed before entering idle state and eventually changes MSR to MSR_IDLE and does rfid to pnv_enter_arch207_idle_mode. Move the updation of HSTATE_HWTHREAD_STATE to pnv_powersave_common from pnv_enter_arch207_idle_mode and make it more generic by passing the rfid address as a function parameter. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:40 +10:00
Shreyas B. Prabhu	5fa6b6bd7a	powerpc/powernv: Rename reusable idle functions to hardware agnostic names Functions like power7_wakeup_loss, power7_wakeup_noloss, power7_wakeup_tb_loss are used by POWER7 and POWER8 hardware. They can also be used by POWER9. Hence rename these functions hardware agnostic names. Suggested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:39 +10:00
Shreyas B. Prabhu	83289f909a	powerpc/powernv: Rename idle_power7.S to idle_book3s.S idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next patch for POWER9. Rename the file to a non-hardware specific name. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:39 +10:00
Shreyas B. Prabhu	1706567117	powerpc/kvm: make hypervisor state restore a function In the current code, when the thread wakes up in reset vector, some of the state restore code and check for whether a thread needs to branch to kvm is duplicated. Reorder the code such that this duplication is avoided. At a higher level this is what the change looks like- Before this patch - power7_wakeup_tb_loss: restore hypervisor state if (thread needed by kvm) goto kvm_start_guest restore nvgprs, cr, pc rfid to process context power7_wakeup_loss: restore nvgprs, cr, pc rfid to process context reset vector: if (waking from deep idle states) goto power7_wakeup_tb_loss else if (thread needed by kvm) goto kvm_start_guest goto power7_wakeup_loss After this patch - power7_wakeup_tb_loss: restore hypervisor state return power7_restore_hyp_resource(): if (waking from deep idle states) goto power7_wakeup_tb_loss return power7_wakeup_loss: restore nvgprs, cr, pc rfid to process context reset vector: power7_restore_hyp_resource() if (thread needed by kvm) goto kvm_start_guest goto power7_wakeup_loss Reviewed-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:38 +10:00
Shreyas B. Prabhu	bfd1b7ae5e	powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkle Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:38 +10:00
Stewart Smith	ec5619fdba	powerpc/lib: Clarify that adde is an instruction and we mean plural Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:37 +10:00
Nathan Fontenot	fdb4f6e99f	powerpc/pseries: Remove call to memblock_add() The call to memblock_add is not needed, this is already done by memory_add(). This patch removes this call which shrinks dlpar_add_lmb_memory() enough that it can be merged into dlpar_add_lmb(). Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:37 +10:00
Nathan Fontenot	ec99907244	powerpc/pseries: Auto-online hotplugged memory A recent update (commit id `31bc3858ea`) allows for automatically onlining memory that is added. This patch sets the config option CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y for pseries and updates the pseries memory hotplug code so that DLPAR added memory can be automatically onlined instead of explicitly onlining the memory. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:37 +10:00
Nathan Fontenot	c05a5a4096	powerpc/pseries: Dynamic add entires to associativity lookup array Dynamically add entries to the associativity lookup array The ibm,associativity-lookup-arrays property may only contain associativity arrays for LMBs present at boot time. When hotplug adding a LMB its associativity array may not be in the associativity lookup array, this patch adds the ability to add new entries to the associativity lookup array. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 20:18:26 +10:00
Nathan Fontenot	c2101c9039	powerpc/pseries: Move property cloning into its own routine Move property cloning code into its own routine Split the pieces of dlpar_clone_drconf_property() that create a copy of the property struct into its own routine. This allows for creating clones of more than just the ibm,dynamic-memory property used in memory hotplug. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 15:02:26 +10:00
Michael Ellerman	236977609a	powerpc/pseries: HVC early debug options should depend on HVC_CONSOLE The pseries HVC early debug options, CONFIG_PPC_EARLY_DEBUG_LPAR and CONFIG_PPC_EARLY_DEBUG_LPAR_HVSI both require code that is part of the hvc driver. If we turn them on but not CONFIG_HVC_CONSOLE then we get: arch/powerpc/kernel/built-in.o: In function `.udbg_early_init': arch/powerpc/kernel/built-in.o:(.debug_addr+0x9a00): undefined reference to `udbg_init_debug_lpar' Similarly for HVSI. So make them both depend on CONFIG_HVC_CONSOLE. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 15:02:26 +10:00
Michael Neuling	6bcb80143e	powerpc/tm: Fix stack pointer corruption in __tm_recheckpoint() At the start of __tm_recheckpoint() we save the kernel stack pointer (r1) in SPRG SCRATCH0 (SPRG2) so that we can restore it after the trecheckpoint. Unfortunately, the same SPRG is used in the SLB miss handler. If an SLB miss is taken between the save and restore of r1 to the SPRG, the SPRG is changed and hence r1 is also corrupted. We can end up with the following crash when we start using r1 again after the restore from the SPRG: Oops: Bad kernel stack pointer, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries CPU: 658 PID: 143777 Comm: htm_demo Tainted: G EL X 4.4.13-0-default #1 task: c0000b56993a7810 ti: c00000000cfec000 task.ti: c0000b56993bc000 NIP: c00000000004f188 LR: 00000000100040b8 CTR: 0000000010002570 REGS: c00000000cfefd40 TRAP: 0300 Tainted: G EL X (4.4.13-0-default) MSR: 8000000300001033 <SF,ME,IR,DR,RI,LE> CR: 02000424 XER: 20000000 CFAR: c000000000008468 DAR: 00003ffd84e66880 DSISR: 40000000 SOFTE: 0 PACATMSCRATCH: 00003ffbc865e680 GPR00: fffffffcfabc4268 00003ffd84e667a0 00000000100d8c38 000000030544bb80 GPR04: 0000000000000002 00000000100cf200 0000000000000449 00000000100cf100 GPR08: 000000000000c350 0000000000002569 0000000000002569 00000000100d6c30 GPR12: 00000000100d6c28 c00000000e6a6b00 00003ffd84660000 0000000000000000 GPR16: 0000000000000003 0000000000000449 0000000010002570 0000010009684f20 GPR20: 0000000000800000 00003ffd84e5f110 00003ffd84e5f7a0 00000000100d0f40 GPR24: 0000000000000000 0000000000000000 0000000000000000 00003ffff0673f50 GPR28: 00003ffd84e5e960 00000000003d0f00 00003ffd84e667a0 00003ffd84e5e680 NIP [c00000000004f188] restore_gprs+0x110/0x17c LR [00000000100040b8] 0x100040b8 Call Trace: Instruction dump: f8a1fff0 e8e700a8 38a00000 7ca10164 e8a1fff8 e821fff0 7c0007dd 7c421378 7db142a6 7c3242a6 38800002 7c810164 <e9c100e0> e9e100e8 ea0100f0 ea2100f8 We hit this on large memory machines (> 2TB) but it can also be hit on smaller machines when 1TB segments are disabled. To hit this, you also need to be virtualised to ensure SLBs are periodically removed by the hypervisor. This patches moves the saving of r1 to the SPRG to the region where we are guaranteed not to take any further SLB misses. Fixes: `98ae22e15b` ("powerpc: Add helper functions for transactional memory context switching") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-15 15:00:18 +10:00
Michael Ellerman	b5f1bf48f2	powerpc fixes for 4.7 #5 - tm: Always reclaim in start_thread() for exec() class syscalls from Cyril Bur - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael Neuling - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan - Initialise pci_io_base as early as possible from Darren Stevens -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXeEmsAAoJEFHr6jzI4aWAwMMQAKs/u9rwB3gpOkNJSHajN1Dd kdqDufzLxLDwbWnMfqM1+bcO2EOjPhKbtpbzhG6oeiET8undRRoLsjHS5rZeYK5h cviRPEJ/Yz8ZWaIgFGI8+02gXwU0MJhuTY8NPexXmsh4FRdKYwEuCIJShl30lg22 P7UrJ2SCNM+H/uZyS07B7thiwBeAKSp6VkLTpuW/QDz2j1ra/F22dTh7c0Agdahd INAMAnh9nYeuMVYn4XjOOlQ07JnBTuf1/W5Wxlw4i/86rVq+Hy8zh5r1X52oysR5 lZl63B9q3agKG9cc9lSN2ibTDVerlFMwB2QysX2a6Uy7+y2SB3hS7VS1RTXCh3hg /omApGGVW3Hh+E2CuKfFLQySU55NRpLAoTGravGr/KsH4wZP/n/fkrctldCrqm7P sTPT52+t+iJQk4fiskRY3yQ7DTTnt3rTC8MJRGqvLuCheolLll4NQaWOF75AJP+7 WFWtC4QHOTPERMkhqLnZDG2vNuDg1H8chuZ2+PxtIs6G1vuOEun+MTZAYh4u6XWE bAIT9rV3xBdE17bzYOQz7lU1y7yNVtP7xkm0HIOAHlU4gUrjQp5u8F3TnPW3/M0m 8GeaZdrPjhsaNg31YZODAeM8Ddf+N9d2a2VPIr/fzytURhMe0ss3Z/MdMoYRATab Lh1o+G3gDo9MVaphoJ3w =oEAY -----END PGP SIGNATURE----- Merge tag 'powerpc-4.7-5' into next Pull in the fixes we sent during 4.7, we have code we want to merge into next that depends on some of them.	2016-07-15 14:57:47 +10:00
Daniel Axtens	95ec77c06e	powerpc: Make ppc_md.{halt, restart} __noreturn powernv marks it's halt and restart calls as __noreturn. However, ppc_md does not have this annotation. Add the annotation to ppc_md, and then to every halt/restart function that is missing it. Additionally, I have verified that all of these functions do not return. Occasionally I have added a spin loop to be sure. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 21:12:06 +10:00
Daniel Axtens	62c2c5cf38	powerpc/sparse: Pass endianness to sparse Explicitly give sparse an endianness in the Makefile, so that it doesn't get confused. Normally we have #ifdef one and #else the other, so it doesn't usually matter, but we have been bitten by it before, and indeed this patch fixes a number of sparse errors. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:44:03 +10:00
Daniel Axtens	f8750513b7	powerpc/kvm: Clarify __user annotations kvmppc_h_put_tce_indirect labels a u64 pointer as __user. It also labelled the u64 where get_user puts the result as __user. This isn't a pointer and so doesn't need to be labelled __user. Split the u64 value definition onto a new line to make it clear that it doesn't get the annotation. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:43:50 +10:00
Anna-Maria Gleixner	c011926fcb	powerpc/pmac/smp: Add missing FROZEN hotplug notifier transitions The FROZEN transitions are used when a CPU suspends/resumes. In case of a suspend/resume, only the up prepare (CPU_UP_PREPARE_FROZEN) is handled. The error handling transition CPU_UP_CANCELED_FROZEN as well as the CPU_ONLINE_FROZEN transition are not handled. Masking the switch case action argument with ~CPU_TASKS_FROZEN, to handle all FROZEN tasks the same way than the corresponding non frozen tasks. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:39:58 +10:00
Andrew Donnellan	b0b5e5918a	cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cards Add a new API, cxl_check_and_switch_mode() to allow for switching of bi-modal CAPI cards, such as the Mellanox CX-4 network card. When a driver requests to switch a card to CAPI mode, use PCI hotplug infrastructure to remove all PCI devices underneath the slot. We then write an updated mode control register to the CAPI VSEC, hot reset the card, and reprobe the card. As the card may present a different set of PCI devices after the mode switch, use the infrastructure provided by the pnv_php driver and the OPAL PCI slot management facilities to ensure that: * the old devices are removed from both the OPAL and Linux device trees * the new devices are probed by OPAL and added to the OPAL device tree * the new devices are added to the Linux device tree and probed through the regular PCI device probe path As such, introduce a new option, CONFIG_CXL_BIMODAL, with a dependency on the pnv_php driver. Refactor existing code that touches the mode control register in the regular single mode case into a new function, setup_cxl_protocol_area(). Co-authored-by: Ian Munsie <imunsie@au1.ibm.com> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:28:11 +10:00
Andrew Donnellan	5473a6bf63	PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state When calling pnv_php_set_slot_power_state() with state == OPAL_PCI_SLOT_OFFLINE, remove devices from the device tree as if we're dealing with OPAL_PCI_SLOT_POWER_OFF. Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:28:10 +10:00
Andrew Donnellan	89379f165a	PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl The cxl driver will use infrastructure from pnv_php to handle device tree updates when switching bi-modal CAPI cards into CAPI mode. To enable this, export pnv_php_find_slot() and pnv_php_set_slot_power_state(), and add corresponding declarations, as well as the definition of struct pnv_php_slot, to asm/pnv-pci.h. Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:28:08 +10:00
Ian Munsie	f67a6722d6	cxl: Workaround PE=0 hardware limitation in Mellanox CX4 The CX4 card cannot cope with a context with PE=0 due to a hardware limitation, resulting in: [ 34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939 [ 34.166580] mlx5_core 0000:01:00.1: Failed allocating uar, aborting Since the kernel API allocates a default context very early during device init that will almost certainly get Process Element ID 0 there is no easy way for us to extend the API to allow the Mellanox to inform us of this limitation ahead of time. Instead, work around the issue by extending the XSL structure to include a minimum PE to allocate. Although the bug is not in the XSL, it is the easiest place to work around this limitation given that the CX4 is currently the only card that uses an XSL. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:28:07 +10:00
Ian Munsie	a2f67d5ee8	cxl: Add support for interrupts on the Mellanox CX4 The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where interrupts are routed from the networking hardware to the XSL using the MSIX table, and from there will be transformed back into an MSIX interrupt using the cxl style interrupts (i.e. using IVTE entries and ranges to map a PE and AFU interrupt number to an MSIX address). We want to hide the implementation details of cxl interrupts as much as possible. To this end, we use a special version of the MSI setup & teardown routines in the PHB while in cxl mode to allocate the cxl interrupts and configure the IVTE entries in the process element. This function does not configure the MSIX table - the CX4 card uses a custom format in that table and it would not be appropriate to fill that out in generic code. The rest of the functionality is similar to the "Full MSI-X mode" described in the CAIA, and this could be easily extended to support other adapters that use that mode in the future. The interrupts will be associated with the default context. If the maximum number of interrupts per context has been limited (e.g. by the mlx5 driver), it will automatically allocate additional kernel contexts to associate extra interrupts as required. These contexts will be started using the same WED that was used to start the default context. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:27:08 +10:00
Ian Munsie	cbce0917e2	cxl: Add preliminary workaround for CX4 interrupt limitation The Mellanox CX4 has a hardware limitation where only 4 bits of the AFU interrupt number can be passed to the XSL when sending an interrupt, limiting it to only 15 interrupts per context (AFU interrupt number 0 is invalid). In order to overcome this, we will allocate additional contexts linked to the default context as extra address space for the extra interrupts - this will be implemented in the next patch. This patch adds the preliminary support to allow this, by way of adding a linked list in the context structure that we use to keep track of the contexts dedicated to interrupts, and an API to simultaneously iterate over the related context structures, AFU interrupt numbers and hardware interrupt numbers. The point of using a single API to iterate these is to hide some of the details of the iteration from external code, and to reduce the number of APIs that need to be exported via base.c to allow built in code to call. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:27:02 +10:00
Ian Munsie	79384e4b71	cxl: Add kernel APIs to get & set the max irqs per context These APIs will be used by the Mellanox CX4 support. While they function standalone to configure existing behaviour, their primary purpose is to allow the Mellanox driver to inform the cxl driver of a hardware limitation, which will be used in a future patch. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:26:57 +10:00
Ian Munsie	317f5ef1b3	cxl: Add support for using the kernel API with a real PHB This hooks up support for using the kernel API with a real PHB. After the AFU initialisation has completed it calls into the PHB code to pass it the AFU that will be used by other peer physical functions on the adapter. The cxl_pci_to_afu API is extended to work with peer PCI devices, retrieving the peer AFU from the PHB. This API may also now return an error if it is called on a PCI device that is not associated with either a cxl vPHB or a peer PCI device to an AFU, and this error is propagated down. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:26:56 +10:00
Ian Munsie	4361b03430	powerpc/powernv: Add support for the cxl kernel api on the real phb This adds support for the peer model of the cxl kernel api to the PowerNV PHB, in which physical function 0 represents the cxl function on the card (an XSL in the case of the CX4), which other physical functions will use for memory access and interrupt services. It is referred to as the peer model as these functions are peers of one another, as opposed to the Virtual PHB model which forms a hierarchy. This patch exports APIs to enable the peer mode, check if a PCI device is attached to a PHB in this mode, and to set and get the peer AFU for this mode. The cxl driver will enable this mode for supported cards by calling pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note that this mode is enabled, and switch out it's controller_ops for the cxl version. The cxl version of the controller_ops struct implements it's own versions of the enable_device_hook and release_device to handle refcounting on the peer AFU and to allocate a default context for the device. Once enabled, the cxl kernel API may not be disabled on a PHB. Currently there is no safe way to disable cxl mode short of a reboot, so until that changes there is no reason to support the disable path. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:26:54 +10:00
Ian Munsie	e4f5fc001a	cxl: Do not create vPHB if there are no AFU configuration records The vPHB model of the cxl kernel API is a hierarchy where the AFU is represented by the vPHB, and it's AFU configuration records are exposed as functions under that vPHB. If there are no AFU configuration records we will create a vPHB with nothing under it, which is a waste of resources and will opt us into EEH handling despite not having anything special to handle. This also does not make sense for cards using the peer model of the cxl kernel API, where the other functions of the device are exposed via additional peer physical functions rather than AFU configuration records. This model will also not work with the existing EEH handling in the cxl driver, as that is designed around the vPHB model. Skip creating the vPHB for AFUs without any AFU configuration records, and opt out of EEH handling for them. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:26:53 +10:00
Ian Munsie	a19bd79e31	cxl: Allow a default context to be associated with an external pci_dev The cxl kernel API has a concept of a default context associated with each PCI device under the virtual PHB. The Mellanox CX4 will also use the cxl kernel API, but it does not use a virtual PHB - rather, the AFU appears as a physical function as a peer to the networking functions. In order to allow the kernel API to work with those networking functions, we will need to associate a default context with them as well. To this end, refactor the corresponding code to do this in vphb.c and export it so that it can be called from the PHB code. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:26:52 +10:00
Ian Munsie	62ccf2d2ef	cxl: Move cxl_afu_get / cxl_afu_put to base The Mellanox CX4 uses a model where the AFU is one physical function of the device, and is used by other peer physical functions of the same device. This will require those other devices to grab a reference on the AFU when they are initialised to make sure that it does not go away during their lifetime. Move the AFU refcount functions to base.c so they can be called from the PHB code. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-14 20:26:36 +10:00

1 2 3 4 5 ...

602478 Commits