Note that the UNROLL option makes the 'inner' des loop unroll all 16 rounds
instead of the default 4.
RISC1 and RISC2 are 2 alternatives for the inner loop and
PTR means to use pointers arithmatic instead of arrays.

IRIX 6.2 - R10000 195mhz - cc (-O3 -n32) - UNROLL RISC2 PTR	496,000 3968k/s
solaris 2.5.1 usparc 167mhz?? - SC4.0 - UNROLL RISC1 PTR	473,500 3788k/s
solaris 2.5.1 usparc 167mhz?? - gcc 2.7.2 - UNROLL RISC1 PTR	306,000 2448k/s
IRIX 5.3 - R4400 200mhz - gcc 2.6.3 - UNROLL RISC2 PTR		235,300 1882k/s
IRIX 5.3 - R4400 200mhz - cc - UNROLL RISC2 PTR			233,700 1869k/s
linux - pentium 100mhz - gcc 2.7.0 - assember			203,000 1624k/s
NT 4.0 - pentium 100mhz - VC 4.2 - assember			201,000 1608k/s
NT 4.0 - pentium 100mhz - VC 4.2 - UNROLL RISC1 PTR?????	191,000 1528k/s
DEC Alpha 165mhz??  - cc - RISC2 PTR				181,000 1448k/s
linux - pentium 100mhz - gcc 2.7.0 - UNROLL RISC1 PTR		158,500 1268k/s
solaris 2.5.1 - sparc 10 50mhz - gcc 2.7.2 - UNROLL		123,600  989k/s
IRIX 5.3 - R4000 100mhz - cc - UNROLL RISC2 PTR			101,000  808k/s
DGUX - 88100 50mhz(?) - gcc 2.6.3 - UNROLL			 81,000  648k/s
HPUX 10 - 9000/887 - k&r cc (default compiler) - UNROLL PTR	 76,000	 608k/s
AIX - old slow one :-) - cc -					 39,000  312k/s
