	Notes on hash based symmetric encryption and signatures
	by Michael Graffam (mgraffam@mhv.net)


The theory of a hash based cipher is a simple one: grab a key from
the user and use this as the initial input to the hash algorithm,
the output values are treated as random numbers to be xor'd with
the plaintext. 

Just what we do when we run out of pseudo-random bits is the interesting
part. The simplest solution, of course is to simply roll back to the
beginning of the hash and keep encrypting.. this method yields a
glorified and still completely insecure Vigenere cipher.

A more cryptographically sound method is to simply throw the hash values
that we had been using back into the one-way function and get some
new bits. While this method is safe against ciphertext-only attacks
(unlike the previous method), it is still vulnerable to a known
plaintext attack. If the attacker knows N bits of plaintext (where
N is the length of the digest), then he easily compute what hash
was used. While he cannot use this hash to get previous bits of
plaintext, all plaintext from that section onward can be read, and
perhaps more importantly, the attacker can change the plaintext
such that upon decryption the message is different (and still
meaningful). 

Obtaining new random bits by hashing the original key and the old
hash solves the problem of a known plaintext attack, but there is
still one cryptanalytic attack to be overcome.

Hashing a key, and then hashing the hash+key will always generate
the same random number stream regardless of the plaintext being
encrypted. From a cryptanalytic point of view we can treat an
encryption system based on this stream like a one-time pad that
has been used twice.

What we need is a random number stream that is affected by the
plaintext values so that even if the same user key is used 
on two different messages the random bits used to encrypt the
messages will be different. Therefore, we use the following
algorithm:

	HASH(User_key) -> H
     +->H xor Plaintext_block -> Ciphertext_block
     |  HASH(User_key + Plaintext_block + H) -> H
     |  Get new Plaintext_block
     |  |
     +--+

If the attacker knows the very first plaintext block, he can 
calculate the hash that it was xor'd against. But he cannot use
this information to get the user's key (because it is a one-way
function) and he cannot get subsequent plaintexts because he would
need the user's key to calculate the next hash value, therefore
the system is secure.

All that having been said, if you need to keep something private,
and you want to use a symmetric key cipher, use IDEA or Blowfish.
Hash functions are not really meant to be used in this fashion,
and will be slower than an algorithm like IDEA. There is also
the problem that hash functions are generally not designed to
be used as a source of random numbers, they are designed to give
a digest of an arbitrary length input. It could very well be that
a hash function used in the manner described above might expose
characteristics to cryptanalytic attack that would not ordinarily
be a problem when used in the manner it was designed for.. this is
unlikely to be a problem, in my opinion, but it is something to
think about.

Not to mention. It's slow ;) 

So why did I take the time to write up a cipher based on this? Well,
I was bored, and I wanted to learn about SHA so I started playing
with the source and started reading the relevant portions in Handbook
of Applied Cryptography. But there are some very practical purposes
for designing such ciphers and exploring their properties.

One, unfortunately very real, use is in the area of export regulations.
It is not illegal to export a hash function but it is illegal to
export a cipher in some countries (you get three guesses to name a
country, the first 2 don't count). By making ciphers based on hash
functions source code could be split into two parts, a random number
generation part and code to XOR bytes together. Both of these packages
should then be able to be exported and simply compiled together with
no legal troubles.

Another, more practical area of use would be in low memory situations
where both hashing and crypto are required. It could be overly expensive
to have two algorithms sitting around; use of one algorithm for both
applications might be needed. 

Symmetric signatures are another matter entirely. It works like this:
to sign a document one simply prepends a secret key and hashes the
new document. This message digest is your signature for the original
message. For someone to verify that you signed the document he checks
like we would in real life: he asks you to sign it and compares the
signatures.

Of course, you can repudiate your signature at any time. If someone
wants you to sign the document, and you don't want them to know
that you signed it you simply don't use the correct key. Such a
system is useful in an environment where you would like to prove
to Winston that you signed a document, and are thus his ally, but
don't want to be held accountable for that signature when you
go over to the Ministry of Love to have tea and talk over your
political views.
