the case for a soft-float calling convention on the ipaq

From: yahoo <yahoo.rm.a.t.muth.org>
Date: Wed Nov 26 2003 - 11:10:24 EST

I hope I am no restarting a discussion here that has already been beaten to
death before.

My goal is to improve the floating point performance on the ipaq for xscale
and strongarm. Neither xscale nor strongarm have a floating point unit, so
fp operations are necessarily slow, but I think the situtation is aggrevated
a lot by the toolchains currently in use.

The tool chains used by familiar/intimate assume a fictious fp coprocessor
modeled after the FPA11 which I believe was only once implemented in hardware
in the ARM7500FE and is now obsolete.

When one needs to perform a fp operation in a worst case scenario
code for the following steps is generated:

1) load/move data into fp registers, e.g. ldfd
2) perform fp operation, e.g. mvfd
3) store/move data out of fp register, e.g. stfd

Also, when a fp parameter is passed to a routine, the caller uses step 1 to
move the arguments into the fictitious coprocessor mandated by the calling
convention.
Similarly, when the callee returns a fp result, it uses step 1, as well.

If you have objdump for arm (it might be called arm-linux-objdump when you are
cross-developing) , you can look for the opcodes yourself

    objdump -d /usr/bin/perl | grep mvfd

or just look for uses of fp registers

    objdump -d /usr/bin/perl | grep "f[0-9]+,"

As I mentioned before there is *no* physical coprocessor that handles these
instructions on the ipaq.
So each of them causes an illegal instruction fault.
The linux kernel will detect this, dispatch and *emulate* the instruction.

Having kernel traps for a floating point operation is extremely costly.
It also wastes space as the kernel must contain the complete FPA11 emulation
code and a big chunk of this emulation code is probably duplicated in the math
libraries as well.

The situation can be improved somewhat by replacing the (emulated) operation
in step 2 with a call to a library function that does the emulation directly (avoiding
the kernel trap and dispatch).
Unfortunately, the library routine has to extract the floating point arguments
from the fictious registers of the fictious coprocessor. So this does not work
either. The best solution would be to use a different calling convention that
passes fp values in general registers not fictitious ones.
The gnu toolchain can be configured to use such a calling conventions, it is
called *soft-float*.

The problem wih soft-float is that it is not binary compatible with
the standard calling convetions (hard-float) used by familiar/intimate.
The incompatibility will become apparent as soon as you use a library call with
fp arguments/results, e.g. strtod() or difftime(). Those simply do not work.

So my suggestion would be to switch familiar/intimiate to the "soft-float"
model ASAP.

We have been using such a tool chain locally for quite a while now without
any problems. The toolchain is 100% fp instruction free including startup code,
glibc, etc..

If there is interest we can provide a script that will generated the complete
toolchain from a bunch of standard tarballs (gcc, binutils,glibc, etc.).

Robert Muth
Received on Wed Nov 26 16:09:23 2003

This archive was generated by hypermail 2.1.8 : Tue May 04 2004 - 09:42:41 EDT