| This directory contains mpn functions for various HP PA-RISC chips. Code |
| that runs faster on the PA7100 and later implementations, is in the pa7100 |
| directory. |
| |
| RELEVANT OPTIMIZATION ISSUES |
| |
| Load and Store timing |
| |
| On the PA7000 no memory instructions can issue the two cycles after a store. |
| For the PA7100, this is reduced to one cycle. |
| |
| The PA7100 has a lookup-free cache, so it helps to schedule loads and the |
| dependent instruction really far from each other. |
| |
| STATUS |
| |
| 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the |
| instructions bwlow (but some sw pipelining is needed to avoid the |
| xmpyu-fstds delay): |
| |
| fldds s1_ptr |
| |
| xmpyu |
| fstds N(%r30) |
| xmpyu |
| fstds N(%r30) |
| |
| ldws N(%r30) |
| ldws N(%r30) |
| ldws N(%r30) |
| ldws N(%r30) |
| |
| addc |
| stws res_ptr |
| addc |
| stws res_ptr |
| |
| addib Loop |
| |
| 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb |
| (asymptotically) on the PA7100, using the instructions below. With proper |
| sw pipelining and the unrolling level below, the speed becomes 8 |
| cycles/limb. |
| |
| fldds s1_ptr |
| fldds s1_ptr |
| |
| xmpyu |
| fstds N(%r30) |
| xmpyu |
| fstds N(%r30) |
| xmpyu |
| fstds N(%r30) |
| xmpyu |
| fstds N(%r30) |
| |
| ldws N(%r30) |
| ldws N(%r30) |
| ldws N(%r30) |
| ldws N(%r30) |
| ldws N(%r30) |
| ldws N(%r30) |
| ldws N(%r30) |
| ldws N(%r30) |
| addc |
| addc |
| addc |
| addc |
| addc %r0,%r0,cy-limb |
| |
| ldws res_ptr |
| ldws res_ptr |
| ldws res_ptr |
| ldws res_ptr |
| add |
| stws res_ptr |
| addc |
| stws res_ptr |
| addc |
| stws res_ptr |
| addc |
| stws res_ptr |
| |
| addib |