License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 21:07:57 +07:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2009-06-03 04:17:37 +07:00
|
|
|
#include <asm/processor.h>
|
2005-09-26 13:04:21 +07:00
|
|
|
#include <asm/ppc_asm.h>
|
2005-10-10 19:20:10 +07:00
|
|
|
#include <asm/reg.h>
|
2009-06-03 04:17:37 +07:00
|
|
|
#include <asm/asm-offsets.h>
|
|
|
|
#include <asm/cputable.h>
|
|
|
|
#include <asm/thread_info.h>
|
|
|
|
#include <asm/page.h>
|
2010-11-18 22:06:17 +07:00
|
|
|
#include <asm/ptrace.h>
|
2016-01-14 11:33:46 +07:00
|
|
|
#include <asm/export.h>
|
2018-07-05 23:24:57 +07:00
|
|
|
#include <asm/asm-compat.h>
|
2009-06-03 04:17:37 +07:00
|
|
|
|
2013-09-10 17:21:10 +07:00
|
|
|
/*
|
|
|
|
* Load state from memory into VMX registers including VSCR.
|
|
|
|
* Assumes the caller has enabled VMX in the MSR.
|
|
|
|
*/
|
|
|
|
_GLOBAL(load_vr_state)
|
|
|
|
li r4,VRSTATE_VSCR
|
2015-02-10 05:51:22 +07:00
|
|
|
lvx v0,r4,r3
|
|
|
|
mtvscr v0
|
2013-09-10 17:21:10 +07:00
|
|
|
REST_32VRS(0,r4,r3)
|
|
|
|
blr
|
2016-01-14 11:33:46 +07:00
|
|
|
EXPORT_SYMBOL(load_vr_state)
|
powerpc/64: Don't trace code that runs with the soft irq mask unreconciled
"Reconciling" in terms of interrupt handling, is to bring the soft irq
mask state in to synch with the hardware, after an interrupt causes
MSR[EE] to be cleared (while the soft mask may be enabled, and hard
irqs not marked disabled).
General kernel code should not be called while unreconciled, because
local_irq_disable, etc. manipulations can cause surprising irq traces,
and it's fragile because the soft irq code does not really expect to
be called in this situation.
When exiting from an interrupt, MSR[EE] is cleared to prevent races,
but soft irq state is enabled for the returned-to context, so this is
now an unreconciled state. restore_math is called in this state, and
that can be ftraced, and the ftrace subsystem disables local irqs.
Mark restore_math and its callees as notrace. Restore a sanity check
in the soft irq code that had to be disabled for this case, by commit
4da1f79227ad4 ("powerpc/64: Disable irq restore warning for now").
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02 12:21:07 +07:00
|
|
|
_ASM_NOKPROBE_SYMBOL(load_vr_state); /* used by restore_math */
|
2013-09-10 17:21:10 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Store VMX state into memory, including VSCR.
|
|
|
|
* Assumes the caller has enabled VMX in the MSR.
|
|
|
|
*/
|
|
|
|
_GLOBAL(store_vr_state)
|
|
|
|
SAVE_32VRS(0, r4, r3)
|
2015-02-10 05:51:22 +07:00
|
|
|
mfvscr v0
|
2013-09-10 17:21:10 +07:00
|
|
|
li r4, VRSTATE_VSCR
|
2015-02-10 05:51:22 +07:00
|
|
|
stvx v0, r4, r3
|
2013-09-10 17:21:10 +07:00
|
|
|
blr
|
2016-01-14 11:33:46 +07:00
|
|
|
EXPORT_SYMBOL(store_vr_state)
|
2013-09-10 17:21:10 +07:00
|
|
|
|
2009-06-03 04:17:37 +07:00
|
|
|
/*
|
|
|
|
* Disable VMX for the task which had it previously,
|
|
|
|
* and save its vector registers in its thread_struct.
|
|
|
|
* Enables the VMX for use in the kernel on return.
|
|
|
|
* On SMP we know the VMX is free, since we give it up every
|
|
|
|
* switch (ie, no lazy save of the vector registers).
|
2013-10-23 15:40:02 +07:00
|
|
|
*
|
|
|
|
* Note that on 32-bit this can only use registers that will be
|
|
|
|
* restored by fast_exception_return, i.e. r3 - r6, r10 and r11.
|
2009-06-03 04:17:37 +07:00
|
|
|
*/
|
|
|
|
_GLOBAL(load_up_altivec)
|
|
|
|
mfmsr r5 /* grab the current MSR */
|
|
|
|
oris r5,r5,MSR_VEC@h
|
|
|
|
MTMSRD(r5) /* enable use of AltiVec now */
|
|
|
|
isync
|
|
|
|
|
2016-05-20 01:41:34 +07:00
|
|
|
/*
|
|
|
|
* While userspace in general ignores VRSAVE, glibc uses it as a boolean
|
|
|
|
* to optimise userspace context save/restore. Whenever we take an
|
|
|
|
* altivec unavailable exception we must set VRSAVE to something non
|
|
|
|
* zero. Set it to all 1s. See also the programming note in the ISA.
|
2009-06-03 04:17:37 +07:00
|
|
|
*/
|
|
|
|
mfspr r4,SPRN_VRSAVE
|
2009-12-09 01:45:45 +07:00
|
|
|
cmpwi 0,r4,0
|
2009-06-03 04:17:37 +07:00
|
|
|
bne+ 1f
|
|
|
|
li r4,-1
|
|
|
|
mtspr SPRN_VRSAVE,r4
|
|
|
|
1:
|
|
|
|
/* enable use of VMX after return */
|
|
|
|
#ifdef CONFIG_PPC32
|
2009-07-15 03:52:54 +07:00
|
|
|
mfspr r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) */
|
2009-06-03 04:17:37 +07:00
|
|
|
oris r9,r9,MSR_VEC@h
|
2019-12-21 15:32:38 +07:00
|
|
|
#ifdef CONFIG_VMAP_STACK
|
|
|
|
tovirt(r5, r5)
|
|
|
|
#endif
|
2009-06-03 04:17:37 +07:00
|
|
|
#else
|
|
|
|
ld r4,PACACURRENT(r13)
|
|
|
|
addi r5,r4,THREAD /* Get THREAD */
|
|
|
|
oris r12,r12,MSR_VEC@h
|
|
|
|
std r12,_MSR(r1)
|
|
|
|
#endif
|
2016-02-29 13:53:47 +07:00
|
|
|
/* Don't care if r4 overflows, this is desired behaviour */
|
|
|
|
lbz r4,THREAD_LOAD_VEC(r5)
|
|
|
|
addi r4,r4,1
|
|
|
|
stb r4,THREAD_LOAD_VEC(r5)
|
2013-10-23 15:40:02 +07:00
|
|
|
addi r6,r5,THREAD_VRSTATE
|
2009-06-03 04:17:37 +07:00
|
|
|
li r4,1
|
2013-09-10 17:20:42 +07:00
|
|
|
li r10,VRSTATE_VSCR
|
2009-06-03 04:17:37 +07:00
|
|
|
stw r4,THREAD_USED_VR(r5)
|
2015-02-10 05:51:22 +07:00
|
|
|
lvx v0,r10,r6
|
|
|
|
mtvscr v0
|
2013-10-23 15:40:02 +07:00
|
|
|
REST_32VRS(0,r4,r6)
|
2009-06-03 04:17:37 +07:00
|
|
|
/* restore registers and return */
|
|
|
|
blr
|
2020-03-31 23:03:44 +07:00
|
|
|
_ASM_NOKPROBE_SYMBOL(load_up_altivec)
|
2009-06-03 04:17:37 +07:00
|
|
|
|
|
|
|
/*
|
2016-02-29 13:53:50 +07:00
|
|
|
* save_altivec(tsk)
|
|
|
|
* Save the vector registers to its thread_struct
|
2009-06-03 04:17:37 +07:00
|
|
|
*/
|
2016-02-29 13:53:50 +07:00
|
|
|
_GLOBAL(save_altivec)
|
2009-06-03 04:17:37 +07:00
|
|
|
addi r3,r3,THREAD /* want THREAD of task */
|
2013-09-10 17:21:10 +07:00
|
|
|
PPC_LL r7,THREAD_VRSAVEAREA(r3)
|
2009-06-03 04:17:37 +07:00
|
|
|
PPC_LL r5,PT_REGS(r3)
|
2013-09-10 17:21:10 +07:00
|
|
|
PPC_LCMPI 0,r7,0
|
|
|
|
bne 2f
|
|
|
|
addi r7,r3,THREAD_VRSTATE
|
2016-02-29 13:53:50 +07:00
|
|
|
2: SAVE_32VRS(0,r4,r7)
|
2015-02-10 05:51:22 +07:00
|
|
|
mfvscr v0
|
2013-09-10 17:20:42 +07:00
|
|
|
li r4,VRSTATE_VSCR
|
2015-02-10 05:51:22 +07:00
|
|
|
stvx v0,r4,r7
|
2009-06-03 04:17:37 +07:00
|
|
|
blr
|
|
|
|
|
|
|
|
#ifdef CONFIG_VSX
|
|
|
|
|
|
|
|
#ifdef CONFIG_PPC32
|
|
|
|
#error This asm code isn't ready for 32-bit kernels
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
|
|
|
* load_up_vsx(unused, unused, tsk)
|
|
|
|
* Disable VSX for the task which had it previously,
|
|
|
|
* and save its vector registers in its thread_struct.
|
|
|
|
* Reuse the fp and vsx saves, but first check to see if they have
|
|
|
|
* been saved already.
|
|
|
|
*/
|
|
|
|
_GLOBAL(load_up_vsx)
|
|
|
|
/* Load FP and VSX registers if they haven't been done yet */
|
|
|
|
andi. r5,r12,MSR_FP
|
|
|
|
beql+ load_up_fpu /* skip if already loaded */
|
|
|
|
andis. r5,r12,MSR_VEC@h
|
|
|
|
beql+ load_up_altivec /* skip if already loaded */
|
|
|
|
|
|
|
|
ld r4,PACACURRENT(r13)
|
|
|
|
addi r4,r4,THREAD /* Get THREAD */
|
|
|
|
li r6,1
|
|
|
|
stw r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
|
|
|
|
/* enable use of VSX after return */
|
|
|
|
oris r12,r12,MSR_VSX@h
|
|
|
|
std r12,_MSR(r1)
|
powerpc/64s: Implement interrupt exit logic in C
Implement the bulk of interrupt return logic in C. The asm return code
must handle a few cases: restoring full GPRs, and emulating stack
store.
The stack store emulation is significantly simplfied, rather than
creating a new return frame and switching to that before performing
the store, it uses the PACA to keep a scratch register around to
perform the store.
The asm return code is moved into 64e for now. The new logic has made
allowance for 64e, but I don't have a full environment that works well
to test it, and even booting in emulated qemu is not great for stress
testing. 64e shouldn't be too far off working with this, given a bit
more testing and auditing of the logic.
This is slightly faster on a POWER9 (page fault speed increases about
1.1%), probably due to reduced mtmsrd.
mpe: Includes fixes from Nick for _TIF_EMULATE_STACK_STORE
handling (including the fast_interrupt_return path), to remove
trace_hardirqs_on(), and fixes the interrupt-return part of the
MSR_VSX restore bug caught by tm-unavailable selftest.
mpe: Incorporate fix from Nick:
The return-to-kernel path has to replay any soft-pending interrupts if
it is returning to a context that had interrupts soft-enabled. It has
to do this carefully and avoid plain enabling interrupts if this is an
irq context, which can cause multiple nesting of interrupts on the
stack, and other unexpected issues.
The code which avoided this case got the soft-mask state wrong, and
marked interrupts as enabled before going around again to retry. This
seems to be mostly harmless except when PREEMPT=y, this calls
preempt_schedule_irq with irqs apparently enabled and runs into a BUG
in kernel/sched/core.c
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-29-npiggin@gmail.com
2020-02-26 00:35:37 +07:00
|
|
|
b fast_interrupt_return
|
2009-06-03 04:17:37 +07:00
|
|
|
|
|
|
|
#endif /* CONFIG_VSX */
|
|
|
|
|
2005-09-26 13:04:21 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The routines below are in assembler so we can closely control the
|
|
|
|
* usage of floating-point registers. These routines must be called
|
|
|
|
* with preempt disabled.
|
|
|
|
*/
|
|
|
|
#ifdef CONFIG_PPC32
|
|
|
|
.data
|
|
|
|
fpzero:
|
|
|
|
.long 0
|
|
|
|
fpone:
|
|
|
|
.long 0x3f800000 /* 1.0 in single-precision FP */
|
|
|
|
fphalf:
|
|
|
|
.long 0x3f000000 /* 0.5 in single-precision FP */
|
|
|
|
|
|
|
|
#define LDCONST(fr, name) \
|
|
|
|
lis r11,name@ha; \
|
|
|
|
lfs fr,name@l(r11)
|
|
|
|
#else
|
|
|
|
|
|
|
|
.section ".toc","aw"
|
|
|
|
fpzero:
|
|
|
|
.tc FD_0_0[TC],0
|
|
|
|
fpone:
|
|
|
|
.tc FD_3ff00000_0[TC],0x3ff0000000000000 /* 1.0 */
|
|
|
|
fphalf:
|
|
|
|
.tc FD_3fe00000_0[TC],0x3fe0000000000000 /* 0.5 */
|
|
|
|
|
|
|
|
#define LDCONST(fr, name) \
|
|
|
|
lfd fr,name@toc(r2)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
.text
|
|
|
|
/*
|
|
|
|
* Internal routine to enable floating point and set FPSCR to 0.
|
|
|
|
* Don't call it from C; it doesn't use the normal calling convention.
|
|
|
|
*/
|
|
|
|
fpenable:
|
|
|
|
#ifdef CONFIG_PPC32
|
|
|
|
stwu r1,-64(r1)
|
|
|
|
#else
|
|
|
|
stdu r1,-64(r1)
|
|
|
|
#endif
|
|
|
|
mfmsr r10
|
|
|
|
ori r11,r10,MSR_FP
|
|
|
|
mtmsr r11
|
|
|
|
isync
|
|
|
|
stfd fr0,24(r1)
|
|
|
|
stfd fr1,16(r1)
|
|
|
|
stfd fr31,8(r1)
|
|
|
|
LDCONST(fr1, fpzero)
|
|
|
|
mffs fr31
|
2006-06-10 17:18:39 +07:00
|
|
|
MTFSF_L(fr1)
|
2005-09-26 13:04:21 +07:00
|
|
|
blr
|
|
|
|
|
|
|
|
fpdisable:
|
|
|
|
mtlr r12
|
2006-06-10 17:18:39 +07:00
|
|
|
MTFSF_L(fr31)
|
2005-09-26 13:04:21 +07:00
|
|
|
lfd fr31,8(r1)
|
|
|
|
lfd fr1,16(r1)
|
|
|
|
lfd fr0,24(r1)
|
|
|
|
mtmsr r10
|
|
|
|
isync
|
|
|
|
addi r1,r1,64
|
|
|
|
blr
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Vector add, floating point.
|
|
|
|
*/
|
|
|
|
_GLOBAL(vaddfp)
|
|
|
|
mflr r12
|
|
|
|
bl fpenable
|
|
|
|
li r0,4
|
|
|
|
mtctr r0
|
|
|
|
li r6,0
|
|
|
|
1: lfsx fr0,r4,r6
|
|
|
|
lfsx fr1,r5,r6
|
|
|
|
fadds fr0,fr0,fr1
|
|
|
|
stfsx fr0,r3,r6
|
|
|
|
addi r6,r6,4
|
|
|
|
bdnz 1b
|
|
|
|
b fpdisable
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Vector subtract, floating point.
|
|
|
|
*/
|
|
|
|
_GLOBAL(vsubfp)
|
|
|
|
mflr r12
|
|
|
|
bl fpenable
|
|
|
|
li r0,4
|
|
|
|
mtctr r0
|
|
|
|
li r6,0
|
|
|
|
1: lfsx fr0,r4,r6
|
|
|
|
lfsx fr1,r5,r6
|
|
|
|
fsubs fr0,fr0,fr1
|
|
|
|
stfsx fr0,r3,r6
|
|
|
|
addi r6,r6,4
|
|
|
|
bdnz 1b
|
|
|
|
b fpdisable
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Vector multiply and add, floating point.
|
|
|
|
*/
|
|
|
|
_GLOBAL(vmaddfp)
|
|
|
|
mflr r12
|
|
|
|
bl fpenable
|
|
|
|
stfd fr2,32(r1)
|
|
|
|
li r0,4
|
|
|
|
mtctr r0
|
|
|
|
li r7,0
|
|
|
|
1: lfsx fr0,r4,r7
|
|
|
|
lfsx fr1,r5,r7
|
|
|
|
lfsx fr2,r6,r7
|
|
|
|
fmadds fr0,fr0,fr2,fr1
|
|
|
|
stfsx fr0,r3,r7
|
|
|
|
addi r7,r7,4
|
|
|
|
bdnz 1b
|
|
|
|
lfd fr2,32(r1)
|
|
|
|
b fpdisable
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Vector negative multiply and subtract, floating point.
|
|
|
|
*/
|
|
|
|
_GLOBAL(vnmsubfp)
|
|
|
|
mflr r12
|
|
|
|
bl fpenable
|
|
|
|
stfd fr2,32(r1)
|
|
|
|
li r0,4
|
|
|
|
mtctr r0
|
|
|
|
li r7,0
|
|
|
|
1: lfsx fr0,r4,r7
|
|
|
|
lfsx fr1,r5,r7
|
|
|
|
lfsx fr2,r6,r7
|
|
|
|
fnmsubs fr0,fr0,fr2,fr1
|
|
|
|
stfsx fr0,r3,r7
|
|
|
|
addi r7,r7,4
|
|
|
|
bdnz 1b
|
|
|
|
lfd fr2,32(r1)
|
|
|
|
b fpdisable
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Vector reciprocal estimate. We just compute 1.0/x.
|
|
|
|
* r3 -> destination, r4 -> source.
|
|
|
|
*/
|
|
|
|
_GLOBAL(vrefp)
|
|
|
|
mflr r12
|
|
|
|
bl fpenable
|
|
|
|
li r0,4
|
|
|
|
LDCONST(fr1, fpone)
|
|
|
|
mtctr r0
|
|
|
|
li r6,0
|
|
|
|
1: lfsx fr0,r4,r6
|
|
|
|
fdivs fr0,fr1,fr0
|
|
|
|
stfsx fr0,r3,r6
|
|
|
|
addi r6,r6,4
|
|
|
|
bdnz 1b
|
|
|
|
b fpdisable
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Vector reciprocal square-root estimate, floating point.
|
|
|
|
* We use the frsqrte instruction for the initial estimate followed
|
|
|
|
* by 2 iterations of Newton-Raphson to get sufficient accuracy.
|
|
|
|
* r3 -> destination, r4 -> source.
|
|
|
|
*/
|
|
|
|
_GLOBAL(vrsqrtefp)
|
|
|
|
mflr r12
|
|
|
|
bl fpenable
|
|
|
|
stfd fr2,32(r1)
|
|
|
|
stfd fr3,40(r1)
|
|
|
|
stfd fr4,48(r1)
|
|
|
|
stfd fr5,56(r1)
|
|
|
|
li r0,4
|
|
|
|
LDCONST(fr4, fpone)
|
|
|
|
LDCONST(fr5, fphalf)
|
|
|
|
mtctr r0
|
|
|
|
li r6,0
|
|
|
|
1: lfsx fr0,r4,r6
|
|
|
|
frsqrte fr1,fr0 /* r = frsqrte(s) */
|
|
|
|
fmuls fr3,fr1,fr0 /* r * s */
|
|
|
|
fmuls fr2,fr1,fr5 /* r * 0.5 */
|
|
|
|
fnmsubs fr3,fr1,fr3,fr4 /* 1 - s * r * r */
|
|
|
|
fmadds fr1,fr2,fr3,fr1 /* r = r + 0.5 * r * (1 - s * r * r) */
|
|
|
|
fmuls fr3,fr1,fr0 /* r * s */
|
|
|
|
fmuls fr2,fr1,fr5 /* r * 0.5 */
|
|
|
|
fnmsubs fr3,fr1,fr3,fr4 /* 1 - s * r * r */
|
|
|
|
fmadds fr1,fr2,fr3,fr1 /* r = r + 0.5 * r * (1 - s * r * r) */
|
|
|
|
stfsx fr1,r3,r6
|
|
|
|
addi r6,r6,4
|
|
|
|
bdnz 1b
|
|
|
|
lfd fr5,56(r1)
|
|
|
|
lfd fr4,48(r1)
|
|
|
|
lfd fr3,40(r1)
|
|
|
|
lfd fr2,32(r1)
|
|
|
|
b fpdisable
|