------------ START OF THE ROSS WILLIAMS INFORMATION PACK -------------

Latest Revision: 27-Nov-1991.

Dear Correspondent,

I  receive so  much email  for requests  for information  that I  have
prepared this  special "Information  Pack". When  I read  your email I
judged that the information you requested  was in the pack and that we
would both be best served by my sending it to you. Hopefully the extra
information contained  in the pack  will compensate for the  lack of a
personal reply for which I apologise.

You may also find this pack appended to a personal message.

NOTE: If you  feel that I have  not answered your enquiry  or that for
any reason the sending of this pack was inappropriate, please email me
at ross@spam.ua.oz.au  again stating that  you have already  been sent
the pack (so as to prevent me sending it again).

Thank you for your interest,

Ross Williams

CONTENTS
 1. How to Contact Me
 2. General Information
 3. The Newsgroup comp.compression
 4. FTP Archive/EMail access to it.
 5. Books on Data Compression
 6. Other Publications
 7. Data Compression Algorithms
 8. Description of the Data Compression Conference
 9. Data Compression Interface Standard
10. Data Compression State Of The Art Graph 21-Jun-1991
11. The Puzzle Box Provisional Patent
12. Gibson and Graybill patent.
13. The "Calgary Corpus".

1. How to Contact Me
--------------------
Name: Dr Ross Williams.
Email address: ross@spam.ua.oz.au
Home phone: +61 8 379-5020      (Note: South Australian time)
Fax: +61 8 373-4911  (Fax a few miles away. I would prefer email).
Home Address:
   16 Lerwick Avenue
   Hazelwood Park 5066
   South Australia
   Australia

Adelaide is a city of about one million on the Southern coast of Australia.

2. General Information
----------------------
1980-1983  University of Adelaide   Honours degree in Computer Science.
1984-1989  University of Adelaide   Ph.D. in Computer Science.
1989-1990  Australian National      Safety Critical Software Analyst.
1991-      Renaissance Software     Pursuing various commercial and
                                    goodwill projects.

Areas  of interest  or expertise:  Data compression,  user interfaces,
programming  language design,  networks, literate  programming, safety
critical and trusted systems, games, Ada, C, Macintosh.

Hobbies: Skydiving, Scuba diving.

3. The Newsgroup comp.compression
---------------------------------
I established the internet newsgroup comp.compression in March 1991 to
carry discussions on  data compression. If you are  interested in data
compression, you should take a look at this newsgroup.

4. FTP Archive/EMail Access to it
---------------------------------
A number of files I have created are available by anonymous FTP from
the following archive: The archive is located at the University of
Adelaide, South Australia. The exact location of the directory is:

   Machine   : sirius.itd.adelaide.edu.au    [IP=129.127.40.3]
   Directory : pub/compression

The files in this directory may be compressed. Files that are
compressed will have names terminating with ".Z" and will have to be
uncompressed before use (using the Unix "uncompress" command). I
wouldn't trust my files to a data compression algorithm :-), but the
system programmer who put the files there for me (I do not have direct
control over this archive) seems to think that it is a good idea.

0readme         - A descriptive file.
dc_stan_just    - Justification of dc_stan_spec.
dc_stan_spec    - Proposal for data compression interface standard.
dc_stan_stream  - Revised proposed standard.
dcc91_report    - Report on 1991 Data Compression Conference.
fast_copy.68000 - Fast block move routine for 68000 microprocessor.
lzrw1-a.68000   - LZRW1-A algorithm implemented in 68000 assembler.
lzrw1-a.c       - LZRW1-A algorithm implemented in C
lzrw1-a.txt     - Release notes for LZRW1-A algorithm.
lzrw1.68000     - LZRW1 algorithm implemented in 68000 assembler.
lzrw1.c         - LZRW1 algorithm implemented in C.
lzrw1.tex       - DCC91 conference paper describing LZRW1 algorithm.
lzrw2.c         - LZRW2 algorithm implemented in C.
lzrw2.txt       - Release notes for LZRW2 algorithm.
lzrw3-a.c       - LZRW3-A algorithm implemented in C.
lzrw3-a.txt     - Release notes for LZRW3-A algorithm.
lzrw3.c         - LZRW3 algorithm implemented in C.
lzrw3.txt       - Release notes for LZRW3 algorithm.
lzrw4.txt       - Description of LZRW4 algorithm (not yet implemented).
lzrw45_covering - Release notes for LZRW4 and LZRW5 descriptions.
lzrw5.txt       - Description of LZRW5 algorithm (not yet implemented).
lzrw_headers.h  - Header files (.h files) for C code.
puzzlebox_provpatent - Provisional patent for puzzle box.
rw_info         - Information about Ross Williams and his activities.

Mail-only users may access anonymous ftp by mail
through an ftp-mail server. Send the word 'help'
in a message to

                ftpmail@cs.uow.edu.au

for information on how to use it. If that doesn't work, try

                ftpmail@decwrl.dec.com

(try the first address first as it will result in less network traffic).
The system administrator at the first site is:
ftpmail-adm@cs.uow.edu.au

5. Books on Data Compression
----------------------------
I have published a book on data compression.

Title: Adaptive Data Compression
Author: Ross N. Williams
ISBN: 0-7923-9085-7
Price: US$75.
Publisher:
   Kluwer Books
   101 Philip Drive
   Assinippi Park
   Norwell MA 02061
   USA
   Ph: +1 (617) 871-6300
   Fx: +1 (617) 871-6528
   Nt: kluwer@world.std.com
Physical description:
   Size: Approx 24cmx17cmx3cm (or approx 9.5"x6.5"x1.25")
   Pages: Hardback, 382 pages, printed on acid free paper.
   Contents include: 106 figures, 51 tables, 185 references, 24 page index.

Here is an abbreviated table of contents:

   Chapter 1: Introductory Survey.................   1
   Chapter 2: The DHPC Algorithm.................. 107
   Chapter 3: A Classification of Adaptivity...... 125
   Chapter 4: An Experimental Adaptive Algorithm.. 145
   Chapter 5: A Multimodal Algortihm.............. 245
   Chapter 6: Applications to User Interfaces..... 283
   Chapter 7: Conclusions......................... 305

Chapter 1  gives a comprehensive  overview of  the field of  text data
compression.  Chapter 2  describes  a simple  Markov algorithm  called
DHPC. Chapter 3 investigates the use of adaptivity in data compression
algorithms. Chapter  4 describes  the SAKDC algorithm  (which curently
holds  the record  for the  best compression  of text  data) and  also
contains a  wealth of  experimental data. Chapter  5 introduces  a new
class  of  adaptivity  in   data  compression  algorithms  (multimodal
adaptivity)  and describes  the  MMDC algorithm  which implements  it.
Chapter 6 shows how data compression techniques can be applied in user
interfaces. There are also appendices, one of which contains ideas for
future research.

Here is the blurb that appears on the back of the book:

---Start of Blurb---

ADAPTIVE DATA COMPRESSION  reviews the field of  text data compression
and then  addresses the problem  of compressing rapidly  changing data
streams. To  compress such  streams, the compressor  must continuously
revise its model of the data,  and sometimes even discard its model in
favour of a new one.

While  the   techniques  involved   in  data  compression   are  often
complicated, the basic ideas are simple, abnd throughout ADAPTIVE DATA
COMPRESSION,  Williams has  endeavoured to  keep them  so. The  reader
should be  able to comprehend this  volume with little or  no previous
knowledge of data compression.

"The modern data compression paradigm  furthered by this work is based
upon   the  separation   of  adaptive   context  modelling,   adaptive
statistics, and arithmetic coding. This  work offers the most complete
bibliography on this  subject I am aware of. It  provides an excellent
and lucid review of the field,  and should be equally as beneficial to
newcomers as to those of us  already in the field." (from the foreword
by Glen G. Langdon, Jr.)

---End of Blurb---


There  are two  other  books (by  other authors)  that  those who  are
becoming  involved in  data  compression  should consider  purchasing.
These are:

Title: Text Compression
Authors: Timothy C. Bell, John G. Cleary, Ian  H. Witten
ISBN: 0-13-911991-4
Price: ?
Publisher:
   Prentice Hall
   Englewood Cliffs
   New Jersey 07632
   Ph: ?
   Fx: ?
   Nt: ?

Title: Data Compression: Methods and Theory
Authors: James A. Storer
ISBN: 0-88175-161-8
Price: ?
Publisher:
   Computer Science Press
   1803 Research Boulvard
   Rockville, Maryland 20850
   Ph: ?
   Fx: ?
   Nt: ?

My book and the Bell book contain  good overviews of the field of data
compression. The  Storer book  is narrower than  the other  two books,
concentrating mainly on text substitution methods.

27-Nov-1991: Stop press! A new book has come out. I haven't seen it
yet. I have heard that it has a much stronger practical orientation
than the other books. Here is the reference:

Title: The Data Compression Book
Author: Mark Nelson
ISBN: 1-55851-216-0
Price: US$36.95
Publisher: M&T books, Redwood City, CA.
Year: 1991

The book apparently comes with two IBM 5.25" disks containing all the
source code for the software described in the book. The book deals in
part with image compression (JPEG).

6. Other Publications
---------------------
The following paper is an early description of the DHPC algorithm and is
almost totally superceded by my book.

   Williams, R.N., "Dynamic-History Predictive Compression",
   Information Systems, 13(1), p.129-140, 1988.

The following paper describes the LZRW1 algorithm:

   Williams, R.N., IEEE Computer Society Data Compression Conference,
   8-11 April 1991, Snowbird, Utah (Conference edited by James A. Storer and
   John H. Reif.

Reprints can be obtained from the IEEE from whom the entire conference
proceedings are also available.

   IEEE Computer Society
   PO Box 3014
   10662 Los Vaqueros Circle
   Los Alamitos CA 90720-1264
   USA
   Ph: +1 (714) 821-8380

The funny  IEEE digits appearing  in the proceedings beneath  my paper
are: TH0373-1/91/0000/0362/$01.00 (c) 1991 IEEE

I also have a few lightweight non-data compression publications.

7. Data Compression Algorithms
------------------------------
I have developed the following data compression algorithms:

   DHPC    - Simple Markov algorithm described in detail in my book.
   SAKDC   - Sophisticated Markov algorithm described in detail in my book.
   MMDC    - Mode swapping algorithm described in detail in my book.
   LZRW1   - An extremely fast LZ algorithm.
   LZRW1-A - A slightly improved version of LZRW1.
   LZRW2   - A slightly slower algorithm that gives better compression.
   LZRW3   - A slightly slower algorithm that gives better compression.
   LZRW3-A - A slightly slower algorithm that gives better compression.
   LZRW4   - A brand new concept in LZ algorithms.
   LZRW5   - An extremly fast LZW-like algorithm.

Algorithms DHPC, SAKDC, MMDC, LZRW1, LZRW1-A, LZRW2, LZRW3, LZRW3-A,
LZRW4 are all in the public domain and can be used freely. LZRW5
described an idea that may be protected by the LZW patent.

(STOP PRESS 30-Sep-1991: A patent has arisen that may cover the LZRW*
algorithms. See the section "Gibson and Graybill Patent".)

Of these algorithms, LZRW3-A will probably be of the most interest to
the most people. The LZRW1-LZRW3 algorithms are all tuned to yield
high speed at the expense of compression. LZRW3-A yields similar
compression to Unix compress, while running about twice as fast
compressing and three times as fast decompressing. It requires just
16K of memory (apart from input and output buffers).

DHPC, SAKDC, MMDC
-----------------
These algorithms were written in  Ada using a custom preprocessor on a
machine that I no longer have access to. Unless tempted by money, I am
not prepared to retrieve this code and make it public for at least one
year.

LZRW1, LZRW1-A, LZRW2, LZRW3, LZRW3-A
-------------------------------------
Working public domain C code for these patent-free algorithms is
available in the FTP archive described earlier. Implementations of
LZRW1 and LZRW1-A in 68000 machine code are also available in the same
archive.

LZRW4-LZRW5
-----------
These two algorithms were released as algorithm descriptions only. To
my knowledge, no implementations currently exist. These two files will
be of interest mainly to those developing data compression compression
algorithms. Shortly after the release of the LZRW5 description, it was
found that the idea had been invented before. See the LZRW5 file for
details.

8. Description of the Data Compression Conference
-------------------------------------------------
In April 1991  the world's first data compression  conference was held
in Snowbird  Utah. The conference  proceedings have been  published by
the IEEE and you can obtain a copy ("DCC91 proceedings") from:

   IEEE Computer Society
   PO Box 3014
   10662 Los Vaqueros Circle
   Los Alamitos CA 90720-1264
   USA
   Ph: +1 (714) 821-8380

The proceedings are in the form of a small, strongly bound book and is
essential reading  for compressor  heads. It may  also be  possible to
obtain a list of the papers presented and then individual reprints.

I  have  written a  description  of  the conference.  The  description
includes a table of contents  for the proceedings. The description can
be found in the FTP archive described earlier.

9. Data Compression Interface Standard
--------------------------------------
I  have   proposed  a   standard  for  interfacing   data  compression
algorithms.

There  are  now so  many  data  compression  algorithms with  so  many
different characteristics, written in  so many different languages, on
so  many  different   systems  that  there  is   pressing  need  for a
standardized  data  compression algorithm  interface.  A  standardized
interface would yield the following benefits:

   * The  standard  interface  would  allow programmers  to  code  for
    generic data compression and then  quickly and easily swap between
    various algorithms  until they find  the one best suited  for thir
    application. Writing glue  code may be trivial, but if  one has to
    do it twenty times, it can become prohibitively time consuming.

   * A standard interface would  assist those cataloging and comparing
    algorithms.

   * A  standard  interface  would  encourage the  designers  of  data
    compression algorithms to provide conforming implementations.

   * A standard interface would make  it easy for programmers to write
    programs that select  from a range of  data compression algorithms
    at run time.

The proposed standards documents can be found in the ftp archive
mentioned earlier. There are two standards in the archive. The first
attracted enough criticism to warrant a rewrite. The second "stream"
standard is currently stalled pending further revision.

10. Data Compression State Of The Art Graph 21-Jun-1991
-------------------------------------------------------
I used the information posted by Peter Gutmann (peter_gutmann@kcbbs.gen.nz)
to hand plot this performance graph of compression algorithms. The graph is
only rough but gives a good idea of what is going on out there. The capital
letter of each compression algorithm is its position on the graph. I didn't
plot algorithms with no listed speed (except for SAKDC which gives the best
compression).

     0         4         8        16        32        64       128       256
     +---------+---------+---------+---------+---------+---------+---------+
     |BAD                                                                  |
   5 +                                                                     + 5
     |    Comp-1                      Pack                                 |
     |                                                            Lzrw1    |
     |                                                                     |
     |                                                                     |
   4 + Comp-1a                                                             + 4
     |           Brent                Stuffit1     Zoo2 Compress DwcPkarc  |
Bits/|Comp-2(1)                       Disdoub Pak1.00**********************|
Byte |                                          *****......................|
     |          Stuffitcl               Lharc1.13c.........................|
   3 +                        rj1       Pak2 Pkzip1.10.....................+ 3
     |                     CoApactor   Lharc2.11...........................|
     |                      ************Arj2...............................|
     |Sakdc*****************...............................................|
     |.................................................................GOOD|
   2 +---------+---------+---------+---------+---------+---------+---------+
     0         4         8        16        32        64       128       256
              Compression Speed in Kilo (1024) Bytes Per Second

****** = State of the art. The dotted zone is where we want to go.

This graph is probably misleading because some of the programs
were run on a Mac-SE and some on a PC386/25.

11. The Puzzle Box Provisional Patent
-------------------------------------
In early 1991, I submitted a provisional patent application for a
device which I call a "puzzle box". The device sits between computers
and critical devices and assists in preventing the accidental
activation of the devices.

All the work done on puzzle boxes appears in the provisional patent
application which appears in the ftp archive described earlier.

The current state of my puzzle box project is that I am deciding
whether to continue with a full patent appliaction.

12. Gibson and Graybill Patent
------------------------------
In mid September 1991 a patent was granted that presents a serious
threat to the LZRW* series of algorithms. Here is a recent posting
(abridged) by myself to comp.compression about the patent.

From: ross@spam.ua.oz.au (Ross Williams)
Newsgroups: comp.compression
Subject: Gibson and Graybill patent.
Date: 28 Sep 91 17:08:19 GMT
...

Anyway, my worst fears are realized. This patent strikes at the heart
of the LZRW* series of algorithms, possibly shutting the door on
LZRW1, LZRW1-A, LZRW2, LZRW3, LZRW3-A, and LZRW4. After all the work I
have put into this series, I would be most disappointed to see them all
moved from the public to the private domain.

The bad news is that the application date of 18-Jun-1990 is BEFORE the
date when I published the LZRW1 algorithm (at DCC91 in April 1991,
submitted about November 1990). The good news is that I invented LZRW1
in lateish 1989 (about August I think) and I can dig up lots of
evidence of this. At least the following for my 1989 development and
for later on too:

   * Detailed time logs of every 15 minute granularity period that I
     spend developing LZRW1 and what I did.
   * A detailed design log indicating all the ideas I came up with in
     developing LZRW1. This includes all the dead ends too.
   * Files of early versions of LZRW1 in C and 68000 code dated late
     1989. Maybe some printouts too.

So there is a chance that we can fight this one. Maybe a lawyer could
comment on the strength of the above. This is all very maddening to
me. I suppose this note will be maddening to the patent holders and I
hope they will not be too upset. However, I really think that it would
be best if this patent was squashed as it locks up too much of the
LZ77 hierarchy which, up to now, has been relatively patent free.

Ross Williams
ross@spam.ua.oz.au
29-Sep-1991

#-------------------------------------------------------------------------
#2182704   3156012
#E/ APPARATUS AND METHOD FOR VERY HIGH DATA RATE-COMPRESSION INCORPORATING
#    LOSSLESS DATA COMPRESSION AND EXPANSION UTILIZING A HASHING TECHNIQUE
#Document Type: UTILITY
#Inventors: Gibson Dean K (US); Graybill Mark D (US)
#Assignee: Intersecting Concepts Inc
#                   Applic      Applic     Patent      Issue
#                   Number       Date      Number       Date
#                  ---------    ------   ----------    ------
#Patent:           US 539880    900618    US 5049881   910917
#Priority Applic:  US 539880    900618
#Abstract:
#A method and apparatus for compressing digital data that is represented as
#a sequence of characters drawn from an alphabet. An input data block is
#processed into an output data block composed of sections of variable
#length. Unlike most prior art methods which emphasize the creation of a
#dictionary comprised of a tree with nodes or a set of strings, the present
#invention creates its own pointers from the sequence characters previously
#processed and emphasizes the highest priority on maximizing the data
#rate-compression factor product. The use of previously input data acting as
#the dictionary combined with the use of a hashing algorithm to find
#candidates for string matches and the absence of a traditional string
#matching table and associated search time allows the compressor to very
#quickly process the input data block. Therefore, the result is a high data
#rate-compression factor product achieved due to the absence of any string
#storage table and matches being tested only against one string.
#
#Exemplary Claim:
#1. A compression method for compressing a stream of input data into a
#   compressed stream of output data based on a minimum number of characters
#   in each input data subblock to be compressed, said compression method
#   comprising the steps of: a. initializing a hash table and initializing
#   an SRC pointer; b. processing input data in the order in which the
#   characters in the data appear and hashing input data subblocks of the
#   minimum compression size selected; c. maintaining a hash table which
#   contains at each entry, an SRC pointer which points to a previous
#   subblock which hashed to this hash table entry, such that the
#   possibility of any string of data previously occurring in the input
#   block may be tested by hashing the current subblock to a hash table
#   entry, obtaining the previous SRC pointer contained in that entry, and
#   comparing the two strings of data; d. if the two strings of data match
#   on at least the size of the subblock, then generating a backwards
#   pointer to the previous occurrence of the same string of data and
#   thereby compressing the second occurrence of the string of data; e. if
#   the two strings of data do not match, then storing the string of data as
#   incompressible data; and f. continuing steps b. through e. until the
#   entire input data has been processed.
#
#Class: 341095000
#Class Cross Ref: 341051000; 341055000
#IPC: H03M-007/30
#-------------------------------------------------------------------------
#--
#Mike Doughney             mike@access.digex.com            +1 301 220 2020
#Digital Express Group, 6006 Greenbelt Road #228, Greenbelt, Maryland 20770
#  **** Express Access:  Public Access Unix for Washington/Baltimore ****
#          **** For more information,  mail to info@digex.com ****

13. The "Calgary Corpus"
------------------------
There is an informally standard suite of data files that is used when
comparing the practical performance of text data compression
algorithms.

You can get it using anonymous FTP from:

   Machine   : fsa.cpsc.ucalgary.ca           [IP=136.159.2.1]
   Directory : /pub/text.compression.corpus/

The corpus was established by Bell, Cleary, and Witten of the
University of Calgary and is used extensively in their book, my book,
and the text data compression community in general.

------------- END OF THE ROSS WILLIAMS INFORMATION PACK --------------