Since I released NSEA I've had a fair bit of feedback on it, chiefly from
Robert Ames (mirage1@gpu.utcs.utoronto.ca) who has made some valuable comments
on NSEA and run various tests on it, one of which showed up a small problem
(not in NSEA itself but in the way keys were set up).

The original distribution of the NSEA code envisaged people using only ASCII
keys (obtained from the user via a gets()), and used '\0' as a delimiter for
the end of the key).  In order to make the use of binary keys possible, I have
changed the code slightly to take an array of bytes and a byte count as the key
instead of a null-terminated array of chars.  Due to the use of binary keys
instead of ASCII ones, this slightly altered version of NSEA is no longer
compatible with the previous one.  If you've been using NSEA and want only
ASCII keys, go ahead and use the older version, though it would be nice if you
could switch to this newer version since it allows a more flexible range of
keys (it's actually impossible to enter a full binary key from the keyboard,
but there may be cases where these keys are generated by other programs
incorporating NSEA as a general-purpose encryption routine).

In addition I didn't like the fact that using an LFSR random number generator
with a constant seed to generate the initial S-Boxes represents a weakness in
the anti-brute-force nature of the algorithm, since the initial S-Boxes can be
precalculated to speed up a brute-force attack.  I used an idea I had had
previously - that of salting the RNG when NSEA is used in a similar manner to
the Unix password-checking code - and modified it to use a salt determined by
the password used rather than some constant predetermined value.  This results
in far too many possible initial S-Boxes to allow any precomputation.  However
since LFSR's perform quite badly for some initial values (ones with few 1-bits
in them), I have substituted a LCRNG in it's place.

Finally, I have improved the sample en/decryption code somewhat.  It appears
that people were actually using it as encryption software, when it was really
only meant as an example of how to use NSEA.  The new code writes a 4-byte ID
header to the encrypted data, as well as an IV seed, which is followed by the
actual encrypted data.	It's not much, but it's usable.  I'll be releasing a
much more sophisticated program using NSEA Real Soon Now.

Due to the increase in size of the NSEA code, this posting only includes the
general description of NSEA.  The full NSEA distribution is available from:

    nic.funet.fi    (Europe)
    wimsey.bc.ca    (America)

Note that both these sites are outside the US and thus not affected by US
export restrictions.

The full distribution includes the following files:

  NSEA.TXT	- This file
  NSEA.H	- NSEA routines interface header
  NSEA.C	- NSEA routines
  NSEA.ASM	- 16- and 32-bit 80x86 NSEA asm.routines
  NSEA.OBJ	- Assembled version of the above (compact mem.model)
  NSEAMAIN.C	- Driver routines for NSEA.C
  NCYCLE.C	- Cycling code for NSEA
  NGCCYLE.C	- Robert Ames' grahical cyler
  NCRACK.C	- A simple NSEA dictionary cracker

NSEA.ASM includes the block encrypt() routine needed for CFB-mode encryption,
as well as code to initialize the sBoxes.  The latter is intended mainly for
use in brute-force attack attempts on NSEA.

The NSEA description follows.....

				-----------------

This is a description of an encryption algorithm I've designed which (for
various reasons explained later) I've called NSEA.  It came to me about a year
ago as I was looking at how CRC's are calculated by table lookup.  If this
method is combined with the mechanism used by the DES (where the data is broken
into two halves and each half alternately used to encrypt the other half) this
makes a simple and fast encryption algorithm.


Derivation of the algorithm:
----------------------------

The standard way of calculating a CRC by table lookup works as follows:  For
every byte of data do:

    crc = crcTable[ ( BYTE ) crc ^ data ] ^ ( crc >> 8 );

At the end of this process we have a (hopefully unique) crc value based on
every byte of the data.  Normally the crcTable is set up using some standard
algorithm, however if we change the way the table is set up based on a
user-supplied key, the final crc value will again be (hopefully unique) based
on the initial key.  This can be used as the basis for an encryption
algorithm.  Consider initially the DES algorithm....

Two rounds of the standard DES algorithm are shown in the following diagram.  L
and R are the left and right halves of each 64-bit input block, K0...K16 are
the 16 keys from the key schedule calculation, and f() is the DES f-function
(the one involving the complex mungeing of data through S-boxes which makes the
algorithm so slow when implemented in software).  The values next to each data
path are the number of bits used, and (+) is the XOR function:

    +-------------------+	 +-------------------+
    |	      L0	|   K0	 |	   R0	     |
    +-------------------+   |	 +-------------------+
	      |32	    |48 	   |32
	      v 	    v	    32	   v
	     (+)<----------f()<------------|
	      | 			   |
	      \------------\ /-------------/
			    X
	      /------------/ \-------------\
	      |32			   |32
	      v 			   v
    +-------------------+	 +-------------------+
    |	   L1 = R0	|   K1	 | R1 = L0 ^ f(R0,K0)|
    +-------------------+   |	 +-------------------+
	      |32	    |48 	   |32
	      v 	    v	    32	   v
	     (+)<----------f()<------------|
	      | 			   |
	      \------------\ /-------------/
			    X
	      /------------/ \-------------\
	      |32			   |32
	      v 			   v
    +-------------------+	 +-------------------+
    |	   L2 = R1	|	 | R2 = L1 ^ f(R1,K1)|
    +-------------------+	 +-------------------+

This can be 'unwound' (at the expense of making the labels a bit confusing) to
give:

    +-------------------+	 +-------------------+
    |	      L0	|  K0	 |	   R0	     |
    +-------------------+   |	 +-------------------+
	      |32	    |48 	   |32
	      v 	    v	    32	   v
	     (+)<----------f()<------------|
	      | 			   |
	      v 			   v
    +-------------------+	 +-------------------+
    | R1 = L0 ^ f(R0,K0)|  K1	 |	L1 = R0      |
    +-------------------+   |	 +-------------------+
	      |32	    |48 	   |32
	      v 	    v	    32	   v
	      |------------f()----------->(+)
	      | 			   |
	      v 			   v
    +-------------------+	 +-------------------+
    |	   L2 = R1	|	 | R2 = L1 ^ f(R1,K1)|
    +-------------------+	 +-------------------+

In comparison, the algorithm NSEA uses is as follows:

    +-------------------+	 +-------------------+
    |	      L0	|	 |	   R0	     |
    +-------------------+	 +-------------------+
	      |64			   |64
	      v      64 	    8	   v
	     (+)<----------f()<------------|
	      | 			   |
	      v 			   v
    +-------------------+	 +-------------------+
    |  L1 = L0 ^ f(R0)	|	 |   R1 = R0 rol 8   |
    +-------------------+	 +-------------------+
	      |64			   |64
	      v      8		   64	   v
	      |------------f()----------->(+)
	      | 			   |
	      v 			   v
    +-------------------+	 +-------------------+
    |	L2 = L1 rol 8	|	 |  R2 = R1 ^ f(L1)  |
    +-------------------+	 +-------------------+

Since only 8 bits of each block are input to the f-function, NSEA needs to run
for ( blockSize / 8 ) rounds to process a single block.  The key input K needed
in the DES algorithm is implicit in the f-function, which is simply a lookup
table with 256 64-bit values which are set up in a key-dependant manner as
described below.  Instead of shifts like the crc method uses we use rotates
since we can't afford to keep losing 8 bits of data in every round.


Outline:
--------

First standard S-boxes are set up using an LCRNG random-number generator seeded
with a key-dependant value to ensure they are always identical initially.  An
additional unique seeding value can be used to deter traffic analysis on
identical message encrypted with identical passwords.  The parameters for the
LCRNG are chosen to disallow any shortcuts which could normally be used to
speed up the generation of pseudorandom numbers.  Then the key, which can be
any length (not just 56 bits like one well-known algorithm), is encrypted to
give 256 blocks of random (but based on the key used) data.  The S-boxes are
then re-initialized using this key.  This is the initial phase of the
encryption.

Encryption works a bit like it does in DES, by breaking the data block into two
halves and alternately encrypting the left and right sides.  After each
encryption operation the data halves are rotated by 8 bits to shift a new byte
into position for the next S-box lookup.  Decryption is simply the reverse of
this.

When used to encrypt files, the algorithm is run in CFB mode.  Code for a CBC
mode encryption program is also present, though the CFB mode is preferred.

More on the Key Setup:
----------------------

In the key setup process, the initial key provided by the user is only used to
seed (and update) the NSEA pseudorandom number generator, but is never directly
involved in any computation.  NSEA can have up to 2K of initial key material.
If the key is shorter than 2K, NSEA pads the remainder out with zeroes.  The
avalanche effect of the encryption algorithm takes care of changing this
padding to pseudorandom (but based on the key used) values.

This key setup method provides 2K - N bytes of known key material, however it
is the N bytes of actual key which gives the security (even if those N bytes
are preprocessed in some way to get, say, 10N bytes, this still only gives N
bytes of actual key material).	NSEA sets the initial key as follows:

    16 bits	    : Key length (in bytes)
    N bytes	    : Actual key
    2K - N+2 bytes  : Zero padding

The length prefix is necessary in case the key ends with one or more zero
bytes, since this would be indistinguishable from the zero padding.  It is
important to note that the zero padding bytes are never used in any of the
setup computations - it is the N bytes of actual key which provides the
security by seeding (and updating) the NSEA random number generator.


More on CFB Mode:
-----------------

The usual way to implement CFB encryption is to use a block cipher as a random
number generator and XOR its output a bit at a time with the cleartext
(discarding the rest of the block and feeding the ciphertext bit back into the
block cipher to generate the next random number).  The NSEA code actually XOR's
a block at a time - there's nothing special about this, it just makes it a lot
faster.  This looks as follows:

    +-----------------+
    |		      | Block cipher output
    +-----------------+
	    (+)
    +-----------------+
    |		      | Cleartext
    +-----------------+
	     |
	     v
    +-----------------+
    |		      | Ciphertext (fed back into block cipher to produce
    +-----------------+ 	    next block of output)

A more detailed description of how NSEA handles CFB mode is as follows ('IV' is
the initialization vector, '[B]' is the block cipher (in this case NSEA, but it
could be any block cipher, eg DES), 'OUTx' is the x-th block cipher output
block, 'Px' is the x-th plaintext block, 'Cx' is the x-th ciphertext block):

  IV -> [B] -> OUT1
	       (+)
		P1 -> C1

		      C1 -> [B] -> OUT2
				   (+)
				    P2 -> C2

					  C2 -> [B] -> OUT3
						       (+)
							P3 -> C3

(What this is supposed to show is that the IV is encrypted and the resulting
output is XOR'd with plaintext block 1 to produce ciphertext block 1.  This is
then fed back into the block encryption algorithm to produce output block 2,
which is XOR'd with plaintext block 2, etc etc).  It's probably easier to look
at the source code than try and decipher this diagram :-).

Now lets assume a chosen plaintext attack, in which you can choose Px.	You can
observe Cx, and using this knowledge can deduce OUTx.  Thus you have gone from
a chosen-plaintext attack on the CFB cipher to a known-plaintext attack on the
block cipher itself - not exactly an improvement.

In addition, CFB as implemeted in the NSEA code allows en/decryption of data in
arbitrary-size blocks (ie it's not sensitive to buffer sizes - you can encrypt
data in groups of 13 bytes for each encryptCFB() call and decrypt in groups of
51 bytes for each decryptCFB() call).  When used in CBC mode, data must be
en/decrypted in multiples of the cipher block size (16 bytes for the current
version of NSEA) for each encrypt()/decrypt() call.  The CFB code allows
decryption of 1 byte per decryptCFB() call, or 1000 bytes per decryptCFB()
call, it makes no odds.


More on the use of an IV:
-------------------------

The use of the IV in NSEA is to hide patterns in the data (in the same way that
CBC or CFB mode is preferred over ECB mode).  However if the IV is always the
same initially, then two identical messages encrypted with an identical
password will be the same, allowing traffic analysis.  If one message (or parts
of one message) are known, then the contents of the second message can be
guessed at.  This weakness occurs if the same IV is always used, and is avoided
by using a random IV for each encryption.   The avalanche effect of the cipher
thus ensures that even the first block is completely different.

The fact that the IV is exposed is not seen as a major worry.  Take for example
CBC mode, which works as follows (pN is plaintext block N, cN is ciphertext
block N):

    IV (+) p1 -> [DES] -> c1

    c1 (+) p2 -> [DES] -> c2

Note that c1, c2, ... are known to an attacker.  Now call the IV 'c0'.	Does it
matter if the attacker knows c0?  Not really - if they don't know c0 they can
just use c1, c2, ... for an attack.  Hiding c0 is of little use - it's main
purpose is to mask patterns/stop traffic analysis.

NSEA allows the insertion of a random IV seed at two points, either when the
key setup is performed or when the IV itself is set up.  The distinction
between these two is relevant only if multiple pieces of data are to be
encrypted with the same password.  If the seeding is performed at the key setup
stage, the key setup must be performed anew for each file (a lengthy process).
If the seeding is performed when the IV itself is set up, this restting of the
keying information is unnecessary.  However this has the disadvantage that an
attacker need only perform the key setup once for an attack on a multitude of
files.	The choice of seeding the keying information or the IV allows a
tradeoff to be made between encryption speed and security.


Extending the algorithm:
------------------------

There are several ways to extend the algorithm, among them being the use of
multiple sets of S-boxes for different rounds of the encryption, and the
constant updating of S-boxes as encryption progresses (so they never remain
constant as they do at the moment).  The former requires largeish amounts of
memory, the latter largeish amounts of time, and I'm not whether they're worth
bothering with.

It's advantages over the DES are:

  - No limit on key length (the basic version has a 2K limit, with multiple
    sets of S-boxes this can be extended).
  - Easy to implement
  - FAST!
  - 128-bit block size
  - Can easily be extended to give more security
  - Comes from outside the US, so not subject to export restrictions
  - Like Ralph Merkle's Khufu, it spends a lot of time setting up its initial
    S-boxes, which (along with the 128-bit block size) makes brute-force
    attacks difficult.

It's disadvantages over the DES are:

  - Hasn't been subject to intense scrutiny for 15 years and may have security
    holes in it big enough to drive a bus through.


Speed:
------

This (pretty non-optimal) version when compiled with -DTIME_TRIAL using a
highly pessimizing compiler ran at just under 50K/s on a 386/25 - an
assembly-language version would probably manage one or two hundred K/s on the
same system (in comparison a highly optimized assembly-language DES
implementation (written in Moscow:-) ran at around 30K/s on the same machine).

A 32-bit version running on a faster system (486/50) under MSDOS achieves
905K/s with -DTIME_TRIAL (the equivalent 16-bit version achieves 591 K/s, the C
version 170 K/s).  Due to the peculiar way 32-bit code works under MSDOS it is
expected that the true throughput for the 32-bit version is well over 1MB/sec.
An additional speedup could be obtained by running it in a different memory
model when in 16-bit mode (the compact memory model was chosen as being
representative of typical program usage).

The key setup is quite slow - the same nonoptimized version as above takes 80ms
to set up one set of S-boxes from an initial key.  Allowing for an
order-of-magnitude speedup for an assembly-language implementation (or a better
CPU:-), this allows a brute-force attack on the encryption key of about 125
attempts a second (in fact the assembly-language version when run on a 486/50
takes 9.2ms to set up one set of S-boxes, for 109 attempts a second).  This
gives the following brute-force cracking times for worst-case keys (lowercase
letters only):

       Key length	Time (est)	Time (486/50)
       ----------	----------	-------------
	4 chars 	  1 hour	 1.2 hours
	5 chars 	 26 hours	  30 hours
	6 chars 	 29 days	  33 days
	7 chars 	  2 years	 2.3 years
	8 chars 	 53 years	  61 years

Raising the minimum key length to encourage the use of phrases as keys rather
than individual words increases this time greatly (as well as making the work
of smart dictionary crackers difficult, since the dictionary used would have to
contain complete phrases, not just individual words).  Certainly 29 days for a
6-character key is too short - your average group of workstations could
brute-force the key in hours or even minutes.  Similarly, relying on the slow
key setup for security may be dangerous - look at what happened with the
password security of a certain OS which relied on the slow running of the DES
algorithm on a PDP-11.	On the other hand virtually all amateur cracking
attempts will be along the lines of a brute-force attempt, so an algorithm
which pessimizes performance for this form of attack is a step in the right
direction.

It is possible to further slow down the key setup by performing some extra
manipulation of the key material and using this as padding instead of the zero
padding currently used.  Robert Ames has suggested raising the number
represented by the key to some power and then take the 1024 most significant
and 1024 least significant digits.  Any algorithm which is inherently slow and
non-parallelizable can be used in this way to slow down the initial key setup
(without affecting the encryption speed itself).


Salting NSEA:
-------------

NSEA can be used as a password hashing algorithm in the same way that the DES
is now used by Unix by using as the salt a 32-bit value used to initialize
the LCRNG which provides the pseudorandom numbers for the initial S-Boxes.  The
salt is a 32-bit value (though only 30 bits of this are actually used), and is
an integral part of NSEA.  Varying the salt rules out the use of a table of
precomputed encrypted passwords for fast password checking, since each password
can have a total of just over 1 billion encrypted values depending on the salt
used.


Now what?:
----------

I'm releasing this code into the public domain - do whatever you want with it.
Since it comes from outside the US there aren't any silly restrictions on it,
so anyone should be able to use it.

For some reason it's become known as "Nonpatented Simple Encryption Algorithm"
(NSEA) - it pretty well sums up its main features :-).

As it stands the code can be used to build five seperate programs:

-DTEST_VERSION	    will create an executable which generates a series of test
		    values running NSEA in CBC mode.
-DTIME_TRIAL	    will create an executable which encrypts 10MB of data in
		    CFB mode for timing purposes.
-DSETUP_TRIAL	    will create an executable which performs 1000 key setups
		    for timing purposes to evaluate brute-force cracking
		    attempts.
-DUSE_CBC	    will create a simple file en/decryption program which uses
		    NSEA in CBC mode.
Straight compile    will create a simple file en/decryption program which uses
		    NSEA in CFB mode.

In addition there are two options to include 80x86 assembly language code to
speed up NSEA when run on a PC.  These require the linking in of the file
NSEA.OBJ (derived from NSEA.ASM), and are:

-DASM_ENCRYPT	    includes extra code for the 80x86 16-bit fast encrypt()
		    routine.
-DASM_ENCRYPT32     includes extra code for the 80x86 32-bit even-faster
		    encrypt() routine.

Finally, if anyone uses this algorithm for anything or knows of any way to
improve it I'd be interested to hear from them.


Test data:
----------

The following output was obtained when running NSEA in CBC mode using
-DTEST_VERSION and a constant salt of 0x00000000:

Key 00000000000000000000000000000000, data 00000000000000000000000000000000:
    890EEFCD0962C4DA5915EFD33192A97A E19D4C5D10D966305B609E3A2A5680E8
    B31BB5F22AFE48FD994C12A3FFA8AFE7 03B753CA0C4E17A465AA2DBDABB9BF2A

Key FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, data FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:
    52F901699FB05D7E7745409AD093CCAE C90EE89E7711359236EAF8B74698F4E0
    359C0F0DD70EFE7704DFEA6259CEB5EE 155629C832DF7E055390BAC8DC7E584C

Key 00000000000000000000000000000001, data 00000000000000000000000000000000:
    6ECB52A11B4B91243255A64BC19E2569 3C644F5D665453241B833934165D73EA
    1C3A724DB56DF562D6267E3EF596B465 794EC36D5F66A0258D8DD5899A839F3B

Key 00000000000000000000000000000000, data 00000000000000000000000000000001:
    8DAD3DB118E4D322CAE4D5BB52886FAC 76D03C929921D93DEBE27B85A32C958B
    C90082E7D1C4D3FD15035055F3F0BF4A 87DC22918CC37577F16A365D39389743

Key 80000000000000000000000000000000, data 00000000000000000000000000000000:
    3A09D79EA754F92BCA033D61043DC718 3834A25968F9BB8851DF6B74E54CFBF9
    BD777A663A1723A6A756D202495C5568 D3CC175C7FA8E0A18CB0D0CCAEED89E8

Key 00000000000000000000000000000000, data 80000000000000000000000000000000:
    1174069C5C11DB94D3370D93164D7AF2 A8F044878D18DD0D3807041AEAE4FD4A
    62002FFC0F7A195AD7BF7A67769C9E00 4643E899427E8E0C8A80DF0BBB6CF45C

Key FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, data 00000000000000000000000000000000:
    3BFD298E434866362106C4AB76AF0A7A 69A8E54F1371689F3416494647B609C5
    649798B583017433F0200541327A7147 A2FB894A96F7B272472937CC8DE5AC43

Key 00000000000000000000000000000000, data FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:
    808E2B93F99DC1A9B785923DDEE90FB8 32D5D877729E016756D4E7685BDEE3EE
    FFE3530CEFE4D06E489B7C953D92B793 978E1443BD79A4955C67AA29541E41F4

Key FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, data 00000000000000000000000000000001:
    DCB139726D614FC57ABD7FAD49B637E8 6398CB3D8AFBBD7CD60B40A6F32CC07A
    8C5BB518BE454F98246908382B9BADFB D73533AD307B49D13E9A669FAB8388A5

Key 00000000000000000000000000000001, data FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:
    91632C0F538D4E3B8D4D3864C5B7AD67 55C66BE2E100BA44BADE8AE0A19C605A
    66406B43D3575EEC1D087C392BEA2470 BE1089D9E8E46B3D56D8A08629C3AE4E

Key 01010101010101010101010101010101, data 01010101010101010101010101010101:
    A5FAEB7E949B483EAF4ADAA24BBE05E7 58B15708738F16EBFBFC5CC1B751140D
    656C0690E799A86940A2BBD312DFBE97 1A21FAA6EE4ECCBF11D0F7931674F80E

Key 80808080808080808080808080808080, data 80808080808080808080808080808080:
    2DB71DAF0C70379440628E097B94E7AA 3413A9F147F7FCCF0FEDEC09260A3B92
    A9E8D7E02746784A307788DCC8AF937B 7B256260215BF8DBE723A1AB5D6B43CC

Key 0123456789ABCDEFFEDCBA9876543210, data 0123456789ABCDEFFEDCBA9876543210:
    28BFB6706AD938A7E65F80AF5BD696C5 338D68D6845C2B2439AFD9B9063F21E1
    C286DF7607768B029E74F45EC5C111E5 A9F5A05965F8E1D5EB6DDBF972EEE682

Key FEDCBA98765432100123456789ABCDEF, data FEDCBA98765432100123456789ABCDEF:
    03A0191B31DF0A4E0D6FD8FBD0859CC8 78068A24F41F562DDCC86025BF627FCC
    5A60A289738D8529AC130237F9954E48 E357422099042326DE09593F5D0C8DFA

