linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-10 13:06:39 +07:00

Author	SHA1	Message	Date
Will Deacon	2d81f1fe81	ARM: lib: add call_with_stack function for safely changing stack When disabling the MMU, it is necessary to take out a 1:1 identity map of the reset code so that it can safely be executed with and without the MMU active. To avoid the situation where the physical address of the reset code aliases with the virtual address of the active stack (which cannot be included in the 1:1 mapping), it is desirable to change to a new stack at a location which is less likely to alias. This code adds a new lib function, call_with_stack: void call_with_stack(void (fn)(void ), void arg, void sp); which changes the stack to point at the sp parameter, before invoking fn(arg) with the new stack selected. Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dave Martin <dave.martin@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2011-12-12 16:07:35 +00:00
Linus Torvalds	4d4487140d	arm: remove "optimized" SHA1 routines Since commit `1eb19a12bd` ("lib/sha1: use the git implementation of SHA-1"), the ARM SHA1 routines no longer work. The reason? They depended on the larger 320-byte workspace, and now the sha1 workspace is just 16 words (64 bytes). So the assembly version would overwrite the stack randomly. The optimized asm version is also probably slower than the new improved C version, so there's no reason to keep it around. At least that was the case in git, where what appears to be the same assembly language version was removed two years ago because the optimized C BLK_SHA1 code was faster. Reported-and-tested-by: Joachim Eastwood <manabian@gmail.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-07 14:07:03 -07:00
Russell King	c9c6fe5033	ARM: Remove support for LinkUp Systems L7200 SDP. This hasn't been actively maintained for a long time, only receiving the occasional build update when things break. I doubt anyone has one of these on their desks anymore. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2010-06-24 15:41:31 +01:00
Lennert Buytenhek	39ec58f3fe	[ARM] alternative copy_to_user/clear_user implementation This implements {copy_to,clear}_user() by faulting in the userland pages and then using the regular kernel mem{cpy,set}() to copy the data (while holding the page table lock). This is a win if the regular mem{cpy,set}() implementations are faster than the user copy functions, which is the case e.g. on Feroceon, where 8-word STMs (which memcpy() uses under the right conditions) give significantly higher memory write throughput than a sequence of individual 32bit stores. Here are numbers for page sized buffers on some Feroceon cores: - copy_to_user on Orion5x goes from 51 MB/s to 83 MB/s - clear_user on Orion5x goes from 89MB/s to 314MB/s - copy_to_user on Kirkwood goes from 240 MB/s to 356 MB/s - clear_user on Kirkwood goes from 367 MB/s to 1108 MB/s - copy_to_user on Disco-Duo goes from 248 MB/s to 398 MB/s - clear_user on Disco-Duo goes from 328 MB/s to 1741 MB/s Because the setup cost is non negligible, this is worthwhile only if the amount of data to copy is large enough. The operation falls back to the standard implementation when the amount of data is below a certain threshold. This threshold was determined empirically, however some targets could benefit from a lower runtime determined value for optimal results eventually. In the copy_from_user() case, this technique does not provide any worthwhile performance gain due to the fact that any kind of read access allocates the cache and subsequent 32bit loads are just as fast as the equivalent 8-word LDM. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Nicolas Pitre <nico@marvell.com> Tested-by: Martin Michlmayr <tbm@cyrius.com>	2009-05-29 22:36:45 -04:00
Russell King	635f0258e5	[ARM] clps7500: remove support The CLPS7500 platform has not built since 2.6.22-git7 and there seems to be no interest in fixing it. So, remove the platform support. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2008-11-27 12:38:11 +00:00
Russell King	9641c7cc5a	[ARM] nommu: uaccess tweaks MMUless systems have only one address space for all threads, so both the usual access_ok() checks, and the exception handling do not make much sense. Hence, discard the fixup and exception tables at link time, use memcpy/memset for the user copy/clearing functions, and define the permission check macros to be constants. Some of this patch was derived from the equivalent patch by Hyok S. Choi. Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-06-28 17:59:46 +01:00
Hyok S. Choi	4682adcfb0	[ARM] nommu: trivial patch for arch/arm/lib/Makefile ifeq ($CONFIG_PREEMPT,y) -> ifeq ($(CONFIG_PREEMPT),y) Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-03-27 15:46:06 +01:00
Nicolas Pitre	fadab0943d	[ARM] 2948/1: new preemption safe copy_{to\|from}_user implementation Patch from Nicolas Pitre This patch provides a preemption safe implementation of copy_to_user and copy_from_user based on the copy template also used for memcpy. It is enabled unconditionally when CONFIG_PREEMPT=y. Otherwise if the configured architecture is not ARMv3 then it is enabled as well as it gives better performances at least on StrongARM and XScale cores. If ARMv3 is not too affected or if it doesn't matter too much then uaccess.S could be removed altogether. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-11-01 19:52:24 +00:00
Nicolas Pitre	7549423000	[ARM] 2947/1: copy template with new memcpy/memmove Patch from Nicolas Pitre This patch provides a new implementation for optimized memory copy functions on ARM. It is made of two levels: a template that consists of the core copy code and separate files that define macros to be used with the core code depending on the type of copy needed. This allows for best performances while sharing the same core for implementing memcpy(), copy_from_user() and copy_to_user() for instance. Two reasons for this work: 1) the current copy_to_user/copy_from_user implementation assumes no task switch will ever occur in the middle of each copied page making it completely unsafe with CONFIG_PREEMPT=y. 2) current copy implementations are measurably suboptimal and optimizing different implementations separately is a pain and more opportunities for bugs. The reason for (1) is the fact that copy inside user pages are performed with the ldm instruction which has no mean for testing user protections and could possibly race with process preemption bypassing the COW mechanism for example. This is a longstanding issue that we said ought to be fixed for about two years now. The solution is to substitute those ldm insns with a series of ldrt or strt insns to enforce user memory protection. At least on StrongARM and XScale cores the ldm is not faster than the equivalent ldr/str insns with a warm i-cache so there is no measurable performance degradation with that change. The fact that the copy code is a template makes it pretty easy to reuse the same core code as for memcpy and benefit from the same performance optimizations. Now (2) is best demonstrated with actual throughput measurements. First, here is a summary of memcopy tests performed on a StrongARM core: PTR alignment buffer size kernel version this version ------------------------------------------------------------ aligned 32 59.73 107.43 unaligned 32 61.31 74.72 aligned 100 132.47 136.15 unaligned 100 103.84 123.76 aligned 4096 130.67 130.80 unaligned 4096 130.68 130.64 aligned 1048576 68.03 68.18 unaligned 1048576 68.03 68.18 The buffer size is in bytes and the measured speed in MB/s. The copy was performed repeatedly with given buffer and throughput averaged over 3 seconds. Here we can see that the current kernel version has a higher entry cost that shows up with small buffers. As buffer size grows both implementation converge to the same throughput. Now here's the exact same test performed on an XScale core (PXA255): PTR alignment buffer size kernel version this version ------------------------------------------------------------ aligned 32 46.99 77.58 unaligned 32 53.61 59.59 aligned 100 107.19 136.59 unaligned 100 83.61 97.58 aligned 4096 129.13 129.98 unaligned 4096 128.36 128.53 aligned 1048576 53.76 59.41 unaligned 1048576 33.67 56.96 Again we can see the entry setup cost being higher for the current kernel before getting to the main copy loop. Then throughput results converge as long as the buffer remains in the cache. Then the 1MB case shows more differences probably due to better pld placement and/or less instruction interlocks in this proposed implementation. Disclaimer: The PXA system was running with slower clocks than the StrongARM system so trying to infer any conclusion by comparing those separate sets of results side by side would be completely inappropriate. So... What this patch does is to replace both memcpy and memmove with an implementation based on the provided copy code template. The memmove code is kept separate since it is used only if the memory areas involved do overlap in which case the code is a transposition of the template but with the copy occurring in the opposite direction (trying to fit that mode into the template turned it into a mess not worth it for memmove alone). And obviously both memcpy and memmove were tested with all kinds of pointer alignments and buffer sizes to exercise all code paths for correctness. The next patch will provide the now trivial replacement implementation copy_to_user and copy_from_user. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-11-01 19:52:23 +00:00
Nicolas Pitre	a0c6fdb987	[ARM] 2946/2: split --arch_clear_user() out of lib/uaccess.S Patch from Nicolas Pitre Required for future enhancement patches. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-11-01 19:52:22 +00:00
Nicolas Pitre	c09f98271f	[ARM] 2930/1: optimized sha1 implementation for ARM Patch from Nicolas Pitre Here's an ARM assembly SHA1 implementation to replace the default C version. It is approximately 50% faster than the generic C version. On an XScale processor running at 400MHz: generic C version: 9.8 MB/s my version: 14.5 MB/s This code is useful to quite a few callers in the tree: crypto/sha1.c: sha_transform(sctx->state, sctx->buffer, temp); crypto/sha1.c: sha_transform(sctx->state, &data[i], temp); drivers/char/random.c: sha_transform(buf, (__u8 )r->pool+i, buf + 5); drivers/char/random.c: sha_transform(buf, (__u8 )data, buf + 5); net/ipv4/syncookies.c: sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5); Signed-off-by: Nicolas Pitre <nico@cam.org> Seems to work fine on big-endian as well. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-10-28 15:26:40 +01:00
Nicolas Pitre	c7e7887666	[PATCH] ARM: 2723/2: remove __udivdi3 and __umoddi3 from the kernel Patch from Nicolas Pitre Those are big, slow and generally not recommended for kernel code. They are even not present on i386. So it should be concluded that one could as well get away with do_div() alone. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-06-29 18:10:54 +01:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

13 Commits