RD, 25.2.93

From the number of instructions one could expect a maximum of 15/100 sec
for the standard multiplication instead of 21/100. This is due to a
branch delay of a total of 3 cycles, one of them can be filled with
the delay instruction, and a load delay of at least one cycle, maybe
more if the loaded value is used immediately.

For 64 bit multiplication all of this should disappear in the time
consumed by the umul instruction. So it is not too bad.
The situation is worse for other functions, e.g. additions and bgcd.


RD, 3.3.93

There seems to be no easily usable "udiv" instruction.
Maybe there is one on later versions of the processor?
Despite the optimized DigitVecMultSub the division performance
is bad.
