src/gmp-6.1.0/mpn/pa32/README - stadia-controller/gcc-arm-none-eabi - Git at Google

 Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc.

 This file is part of the GNU MP Library.

 The GNU MP Library is free software; you can redistribute it and/or modify
 it under the terms of either:

   * the GNU Lesser General Public License as published by the Free
     Software Foundation; either version 3 of the License, or (at your
     option) any later version.

 or

   * the GNU General Public License as published by the Free Software
     Foundation; either version 2 of the License, or (at your option) any
     later version.

 or both in parallel, as here.

 The GNU MP Library is distributed in the hope that it will be useful, but
 WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 for more details.

 You should have received copies of the GNU General Public License and the
 GNU Lesser General Public License along with the GNU MP Library.  If not,
 see https://www.gnu.org/licenses/.


 This directory contains mpn functions for various HP PA-RISC chips.  Code
 that runs faster on the PA7100 and later implementations, is in the pa7100
 directory.

 RELEVANT OPTIMIZATION ISSUES

   Load and Store timing

 On the PA7000 no memory instructions can issue the two cycles after a store.
 For the PA7100, this is reduced to one cycle.

 The PA7100 has a lookup-free cache, so it helps to schedule loads and the
 dependent instruction really far from each other.

 STATUS

 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
    instructions below (but some sw pipelining is needed to avoid the
    xmpyu-fstds delay):

 	fldds	s1_ptr

 	xmpyu
 	fstds	N(%r30)
 	xmpyu
 	fstds	N(%r30)

 	ldws	N(%r30)
 	ldws	N(%r30)
 	ldws	N(%r30)
 	ldws	N(%r30)

 	addc
 	stws	res_ptr
 	addc
 	stws	res_ptr

 	addib	Loop

 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
    (asymptotically) on the PA7100, using the instructions below.  With proper
    sw pipelining and the unrolling level below, the speed becomes 8
    cycles/limb.

 	fldds	s1_ptr
 	fldds	s1_ptr

 	xmpyu
 	fstds	N(%r30)
 	xmpyu
 	fstds	N(%r30)
 	xmpyu
 	fstds	N(%r30)
 	xmpyu
 	fstds	N(%r30)

 	ldws	N(%r30)
 	ldws	N(%r30)
 	ldws	N(%r30)
 	ldws	N(%r30)
 	ldws	N(%r30)
 	ldws	N(%r30)
 	ldws	N(%r30)
 	ldws	N(%r30)
 	addc
 	addc
 	addc
 	addc
 	addc	%r0,%r0,cy-limb

 	ldws	res_ptr
 	ldws	res_ptr
 	ldws	res_ptr
 	ldws	res_ptr
 	add
 	stws	res_ptr
 	addc
 	stws	res_ptr
 	addc
 	stws	res_ptr
 	addc
 	stws	res_ptr

 	addib

 3. For the PA8000 we have to stick to using 32-bit limbs before compiler
    support emerges.  But we want to use 64-bit operations whenever possible,
    in particular for loads and stores.  It is possible to handle mpn_add_n
    efficiently by rotating (when s1/s2 are aligned), masking+bit field
    inserting when (they are not).  The speed should double compared to the
    code used today.


 LABEL SYNTAX

 The HP-UX assembler takes labels starting in column 0 with no colon,

 	L$loop  ldws,mb -4(0,%r25),%r22

 Gas on hppa GNU/Linux however requires a colon,

 	L$loop: ldws,mb -4(0,%r25),%r22

 This is covered by using LDEF() from asm-defs.m4.  An alternative would be
 to use ".label" which is accepted by both,

 		.label  L$loop
 		ldws,mb -4(0,%r25),%r22

 but that's not as nice to look at, not if you're used to assembler code
 having labels in column 0.


 REFERENCES

 Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998,
 part number 92432-90012.


 ----------------
 Local variables:
 mode: text
 fill-column: 76
 End:
	Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc.

	This file is part of the GNU MP Library.

	The GNU MP Library is free software; you can redistribute it and/or modify
	it under the terms of either:

	* the GNU Lesser General Public License as published by the Free
	Software Foundation; either version 3 of the License, or (at your
	option) any later version.

	or

	* the GNU General Public License as published by the Free Software
	Foundation; either version 2 of the License, or (at your option) any
	later version.

	or both in parallel, as here.

	The GNU MP Library is distributed in the hope that it will be useful, but
	WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
	or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
	for more details.

	You should have received copies of the GNU General Public License and the
	GNU Lesser General Public License along with the GNU MP Library. If not,
	see https://www.gnu.org/licenses/.






	This directory contains mpn functions for various HP PA-RISC chips. Code
	that runs faster on the PA7100 and later implementations, is in the pa7100
	directory.

	RELEVANT OPTIMIZATION ISSUES

	Load and Store timing

	On the PA7000 no memory instructions can issue the two cycles after a store.
	For the PA7100, this is reduced to one cycle.

	The PA7100 has a lookup-free cache, so it helps to schedule loads and the
	dependent instruction really far from each other.

	STATUS

	1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
	instructions below (but some sw pipelining is needed to avoid the
	xmpyu-fstds delay):

	fldds s1_ptr

	xmpyu
	fstds N(%r30)
	xmpyu
	fstds N(%r30)

	ldws N(%r30)
	ldws N(%r30)
	ldws N(%r30)
	ldws N(%r30)

	addc
	stws res_ptr
	addc
	stws res_ptr

	addib Loop

	2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
	(asymptotically) on the PA7100, using the instructions below. With proper
	sw pipelining and the unrolling level below, the speed becomes 8
	cycles/limb.

	fldds s1_ptr
	fldds s1_ptr

	xmpyu
	fstds N(%r30)
	xmpyu
	fstds N(%r30)
	xmpyu
	fstds N(%r30)
	xmpyu
	fstds N(%r30)

	ldws N(%r30)
	ldws N(%r30)
	ldws N(%r30)
	ldws N(%r30)
	ldws N(%r30)
	ldws N(%r30)
	ldws N(%r30)
	ldws N(%r30)
	addc
	addc
	addc
	addc
	addc %r0,%r0,cy-limb

	ldws res_ptr
	ldws res_ptr
	ldws res_ptr
	ldws res_ptr
	add
	stws res_ptr
	addc
	stws res_ptr
	addc
	stws res_ptr
	addc
	stws res_ptr

	addib

	3. For the PA8000 we have to stick to using 32-bit limbs before compiler
	support emerges. But we want to use 64-bit operations whenever possible,
	in particular for loads and stores. It is possible to handle mpn_add_n
	efficiently by rotating (when s1/s2 are aligned), masking+bit field
	inserting when (they are not). The speed should double compared to the
	code used today.




	LABEL SYNTAX

	The HP-UX assembler takes labels starting in column 0 with no colon,

	L$loop ldws,mb -4(0,%r25),%r22

	Gas on hppa GNU/Linux however requires a colon,

	L$loop: ldws,mb -4(0,%r25),%r22

	This is covered by using LDEF() from asm-defs.m4. An alternative would be
	to use ".label" which is accepted by both,

	.label L$loop
	ldws,mb -4(0,%r25),%r22

	but that's not as nice to look at, not if you're used to assembler code
	having labels in column 0.




	REFERENCES

	Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998,
	part number 92432-90012.



	----------------
	Local variables:
	mode: text
	fill-column: 76
	End: