------------ START OF THE ROSS WILLIAMS INFORMATION PACK ------------- Latest Revision: 27-Nov-1991. Dear Correspondent, I receive so much email for requests for information that I have prepared this special "Information Pack". When I read your email I judged that the information you requested was in the pack and that we would both be best served by my sending it to you. Hopefully the extra information contained in the pack will compensate for the lack of a personal reply for which I apologise. You may also find this pack appended to a personal message. NOTE: If you feel that I have not answered your enquiry or that for any reason the sending of this pack was inappropriate, please email me at ross@spam.ua.oz.au again stating that you have already been sent the pack (so as to prevent me sending it again). Thank you for your interest, Ross Williams CONTENTS 1. How to Contact Me 2. General Information 3. The Newsgroup comp.compression 4. FTP Archive/EMail access to it. 5. Books on Data Compression 6. Other Publications 7. Data Compression Algorithms 8. Description of the Data Compression Conference 9. Data Compression Interface Standard 10. Data Compression State Of The Art Graph 21-Jun-1991 11. The Puzzle Box Provisional Patent 12. Gibson and Graybill patent. 13. The "Calgary Corpus". 1. How to Contact Me -------------------- Name: Dr Ross Williams. Email address: ross@spam.ua.oz.au Home phone: +61 8 379-5020 (Note: South Australian time) Fax: +61 8 373-4911 (Fax a few miles away. I would prefer email). Home Address: 16 Lerwick Avenue Hazelwood Park 5066 South Australia Australia Adelaide is a city of about one million on the Southern coast of Australia. 2. General Information ---------------------- 1980-1983 University of Adelaide Honours degree in Computer Science. 1984-1989 University of Adelaide Ph.D. in Computer Science. 1989-1990 Australian National Safety Critical Software Analyst. 1991- Renaissance Software Pursuing various commercial and goodwill projects. Areas of interest or expertise: Data compression, user interfaces, programming language design, networks, literate programming, safety critical and trusted systems, games, Ada, C, Macintosh. Hobbies: Skydiving, Scuba diving. 3. The Newsgroup comp.compression --------------------------------- I established the internet newsgroup comp.compression in March 1991 to carry discussions on data compression. If you are interested in data compression, you should take a look at this newsgroup. 4. FTP Archive/EMail Access to it --------------------------------- A number of files I have created are available by anonymous FTP from the following archive: The archive is located at the University of Adelaide, South Australia. The exact location of the directory is: Machine : sirius.itd.adelaide.edu.au [IP=129.127.40.3] Directory : pub/compression The files in this directory may be compressed. Files that are compressed will have names terminating with ".Z" and will have to be uncompressed before use (using the Unix "uncompress" command). I wouldn't trust my files to a data compression algorithm :-), but the system programmer who put the files there for me (I do not have direct control over this archive) seems to think that it is a good idea. 0readme - A descriptive file. dc_stan_just - Justification of dc_stan_spec. dc_stan_spec - Proposal for data compression interface standard. dc_stan_stream - Revised proposed standard. dcc91_report - Report on 1991 Data Compression Conference. fast_copy.68000 - Fast block move routine for 68000 microprocessor. lzrw1-a.68000 - LZRW1-A algorithm implemented in 68000 assembler. lzrw1-a.c - LZRW1-A algorithm implemented in C lzrw1-a.txt - Release notes for LZRW1-A algorithm. lzrw1.68000 - LZRW1 algorithm implemented in 68000 assembler. lzrw1.c - LZRW1 algorithm implemented in C. lzrw1.tex - DCC91 conference paper describing LZRW1 algorithm. lzrw2.c - LZRW2 algorithm implemented in C. lzrw2.txt - Release notes for LZRW2 algorithm. lzrw3-a.c - LZRW3-A algorithm implemented in C. lzrw3-a.txt - Release notes for LZRW3-A algorithm. lzrw3.c - LZRW3 algorithm implemented in C. lzrw3.txt - Release notes for LZRW3 algorithm. lzrw4.txt - Description of LZRW4 algorithm (not yet implemented). lzrw45_covering - Release notes for LZRW4 and LZRW5 descriptions. lzrw5.txt - Description of LZRW5 algorithm (not yet implemented). lzrw_headers.h - Header files (.h files) for C code. puzzlebox_provpatent - Provisional patent for puzzle box. rw_info - Information about Ross Williams and his activities. Mail-only users may access anonymous ftp by mail through an ftp-mail server. Send the word 'help' in a message to ftpmail@cs.uow.edu.au for information on how to use it. If that doesn't work, try ftpmail@decwrl.dec.com (try the first address first as it will result in less network traffic). The system administrator at the first site is: ftpmail-adm@cs.uow.edu.au 5. Books on Data Compression ---------------------------- I have published a book on data compression. Title: Adaptive Data Compression Author: Ross N. Williams ISBN: 0-7923-9085-7 Price: US$75. Publisher: Kluwer Books 101 Philip Drive Assinippi Park Norwell MA 02061 USA Ph: +1 (617) 871-6300 Fx: +1 (617) 871-6528 Nt: kluwer@world.std.com Physical description: Size: Approx 24cmx17cmx3cm (or approx 9.5"x6.5"x1.25") Pages: Hardback, 382 pages, printed on acid free paper. Contents include: 106 figures, 51 tables, 185 references, 24 page index. Here is an abbreviated table of contents: Chapter 1: Introductory Survey................. 1 Chapter 2: The DHPC Algorithm.................. 107 Chapter 3: A Classification of Adaptivity...... 125 Chapter 4: An Experimental Adaptive Algorithm.. 145 Chapter 5: A Multimodal Algortihm.............. 245 Chapter 6: Applications to User Interfaces..... 283 Chapter 7: Conclusions......................... 305 Chapter 1 gives a comprehensive overview of the field of text data compression. Chapter 2 describes a simple Markov algorithm called DHPC. Chapter 3 investigates the use of adaptivity in data compression algorithms. Chapter 4 describes the SAKDC algorithm (which curently holds the record for the best compression of text data) and also contains a wealth of experimental data. Chapter 5 introduces a new class of adaptivity in data compression algorithms (multimodal adaptivity) and describes the MMDC algorithm which implements it. Chapter 6 shows how data compression techniques can be applied in user interfaces. There are also appendices, one of which contains ideas for future research. Here is the blurb that appears on the back of the book: ---Start of Blurb--- ADAPTIVE DATA COMPRESSION reviews the field of text data compression and then addresses the problem of compressing rapidly changing data streams. To compress such streams, the compressor must continuously revise its model of the data, and sometimes even discard its model in favour of a new one. While the techniques involved in data compression are often complicated, the basic ideas are simple, abnd throughout ADAPTIVE DATA COMPRESSION, Williams has endeavoured to keep them so. The reader should be able to comprehend this volume with little or no previous knowledge of data compression. "The modern data compression paradigm furthered by this work is based upon the separation of adaptive context modelling, adaptive statistics, and arithmetic coding. This work offers the most complete bibliography on this subject I am aware of. It provides an excellent and lucid review of the field, and should be equally as beneficial to newcomers as to those of us already in the field." (from the foreword by Glen G. Langdon, Jr.) ---End of Blurb--- There are two other books (by other authors) that those who are becoming involved in data compression should consider purchasing. These are: Title: Text Compression Authors: Timothy C. Bell, John G. Cleary, Ian H. Witten ISBN: 0-13-911991-4 Price: ? Publisher: Prentice Hall Englewood Cliffs New Jersey 07632 Ph: ? Fx: ? Nt: ? Title: Data Compression: Methods and Theory Authors: James A. Storer ISBN: 0-88175-161-8 Price: ? Publisher: Computer Science Press 1803 Research Boulvard Rockville, Maryland 20850 Ph: ? Fx: ? Nt: ? My book and the Bell book contain good overviews of the field of data compression. The Storer book is narrower than the other two books, concentrating mainly on text substitution methods. 27-Nov-1991: Stop press! A new book has come out. I haven't seen it yet. I have heard that it has a much stronger practical orientation than the other books. Here is the reference: Title: The Data Compression Book Author: Mark Nelson ISBN: 1-55851-216-0 Price: US$36.95 Publisher: M&T books, Redwood City, CA. Year: 1991 The book apparently comes with two IBM 5.25" disks containing all the source code for the software described in the book. The book deals in part with image compression (JPEG). 6. Other Publications --------------------- The following paper is an early description of the DHPC algorithm and is almost totally superceded by my book. Williams, R.N., "Dynamic-History Predictive Compression", Information Systems, 13(1), p.129-140, 1988. The following paper describes the LZRW1 algorithm: Williams, R.N., IEEE Computer Society Data Compression Conference, 8-11 April 1991, Snowbird, Utah (Conference edited by James A. Storer and John H. Reif. Reprints can be obtained from the IEEE from whom the entire conference proceedings are also available. IEEE Computer Society PO Box 3014 10662 Los Vaqueros Circle Los Alamitos CA 90720-1264 USA Ph: +1 (714) 821-8380 The funny IEEE digits appearing in the proceedings beneath my paper are: TH0373-1/91/0000/0362/$01.00 (c) 1991 IEEE I also have a few lightweight non-data compression publications. 7. Data Compression Algorithms ------------------------------ I have developed the following data compression algorithms: DHPC - Simple Markov algorithm described in detail in my book. SAKDC - Sophisticated Markov algorithm described in detail in my book. MMDC - Mode swapping algorithm described in detail in my book. LZRW1 - An extremely fast LZ algorithm. LZRW1-A - A slightly improved version of LZRW1. LZRW2 - A slightly slower algorithm that gives better compression. LZRW3 - A slightly slower algorithm that gives better compression. LZRW3-A - A slightly slower algorithm that gives better compression. LZRW4 - A brand new concept in LZ algorithms. LZRW5 - An extremly fast LZW-like algorithm. Algorithms DHPC, SAKDC, MMDC, LZRW1, LZRW1-A, LZRW2, LZRW3, LZRW3-A, LZRW4 are all in the public domain and can be used freely. LZRW5 described an idea that may be protected by the LZW patent. (STOP PRESS 30-Sep-1991: A patent has arisen that may cover the LZRW* algorithms. See the section "Gibson and Graybill Patent".) Of these algorithms, LZRW3-A will probably be of the most interest to the most people. The LZRW1-LZRW3 algorithms are all tuned to yield high speed at the expense of compression. LZRW3-A yields similar compression to Unix compress, while running about twice as fast compressing and three times as fast decompressing. It requires just 16K of memory (apart from input and output buffers). DHPC, SAKDC, MMDC ----------------- These algorithms were written in Ada using a custom preprocessor on a machine that I no longer have access to. Unless tempted by money, I am not prepared to retrieve this code and make it public for at least one year. LZRW1, LZRW1-A, LZRW2, LZRW3, LZRW3-A ------------------------------------- Working public domain C code for these patent-free algorithms is available in the FTP archive described earlier. Implementations of LZRW1 and LZRW1-A in 68000 machine code are also available in the same archive. LZRW4-LZRW5 ----------- These two algorithms were released as algorithm descriptions only. To my knowledge, no implementations currently exist. These two files will be of interest mainly to those developing data compression compression algorithms. Shortly after the release of the LZRW5 description, it was found that the idea had been invented before. See the LZRW5 file for details. 8. Description of the Data Compression Conference ------------------------------------------------- In April 1991 the world's first data compression conference was held in Snowbird Utah. The conference proceedings have been published by the IEEE and you can obtain a copy ("DCC91 proceedings") from: IEEE Computer Society PO Box 3014 10662 Los Vaqueros Circle Los Alamitos CA 90720-1264 USA Ph: +1 (714) 821-8380 The proceedings are in the form of a small, strongly bound book and is essential reading for compressor heads. It may also be possible to obtain a list of the papers presented and then individual reprints. I have written a description of the conference. The description includes a table of contents for the proceedings. The description can be found in the FTP archive described earlier. 9. Data Compression Interface Standard -------------------------------------- I have proposed a standard for interfacing data compression algorithms. There are now so many data compression algorithms with so many different characteristics, written in so many different languages, on so many different systems that there is pressing need for a standardized data compression algorithm interface. A standardized interface would yield the following benefits: * The standard interface would allow programmers to code for generic data compression and then quickly and easily swap between various algorithms until they find the one best suited for thir application. Writing glue code may be trivial, but if one has to do it twenty times, it can become prohibitively time consuming. * A standard interface would assist those cataloging and comparing algorithms. * A standard interface would encourage the designers of data compression algorithms to provide conforming implementations. * A standard interface would make it easy for programmers to write programs that select from a range of data compression algorithms at run time. The proposed standards documents can be found in the ftp archive mentioned earlier. There are two standards in the archive. The first attracted enough criticism to warrant a rewrite. The second "stream" standard is currently stalled pending further revision. 10. Data Compression State Of The Art Graph 21-Jun-1991 ------------------------------------------------------- I used the information posted by Peter Gutmann (peter_gutmann@kcbbs.gen.nz) to hand plot this performance graph of compression algorithms. The graph is only rough but gives a good idea of what is going on out there. The capital letter of each compression algorithm is its position on the graph. I didn't plot algorithms with no listed speed (except for SAKDC which gives the best compression). 0 4 8 16 32 64 128 256 +---------+---------+---------+---------+---------+---------+---------+ |BAD | 5 + + 5 | Comp-1 Pack | | Lzrw1 | | | | | 4 + Comp-1a + 4 | Brent Stuffit1 Zoo2 Compress DwcPkarc | Bits/|Comp-2(1) Disdoub Pak1.00**********************| Byte | *****......................| | Stuffitcl Lharc1.13c.........................| 3 + rj1 Pak2 Pkzip1.10.....................+ 3 | CoApactor Lharc2.11...........................| | ************Arj2...............................| |Sakdc*****************...............................................| |.................................................................GOOD| 2 +---------+---------+---------+---------+---------+---------+---------+ 0 4 8 16 32 64 128 256 Compression Speed in Kilo (1024) Bytes Per Second ****** = State of the art. The dotted zone is where we want to go. This graph is probably misleading because some of the programs were run on a Mac-SE and some on a PC386/25. 11. The Puzzle Box Provisional Patent ------------------------------------- In early 1991, I submitted a provisional patent application for a device which I call a "puzzle box". The device sits between computers and critical devices and assists in preventing the accidental activation of the devices. All the work done on puzzle boxes appears in the provisional patent application which appears in the ftp archive described earlier. The current state of my puzzle box project is that I am deciding whether to continue with a full patent appliaction. 12. Gibson and Graybill Patent ------------------------------ In mid September 1991 a patent was granted that presents a serious threat to the LZRW* series of algorithms. Here is a recent posting (abridged) by myself to comp.compression about the patent. From: ross@spam.ua.oz.au (Ross Williams) Newsgroups: comp.compression Subject: Gibson and Graybill patent. Date: 28 Sep 91 17:08:19 GMT ... Anyway, my worst fears are realized. This patent strikes at the heart of the LZRW* series of algorithms, possibly shutting the door on LZRW1, LZRW1-A, LZRW2, LZRW3, LZRW3-A, and LZRW4. After all the work I have put into this series, I would be most disappointed to see them all moved from the public to the private domain. The bad news is that the application date of 18-Jun-1990 is BEFORE the date when I published the LZRW1 algorithm (at DCC91 in April 1991, submitted about November 1990). The good news is that I invented LZRW1 in lateish 1989 (about August I think) and I can dig up lots of evidence of this. At least the following for my 1989 development and for later on too: * Detailed time logs of every 15 minute granularity period that I spend developing LZRW1 and what I did. * A detailed design log indicating all the ideas I came up with in developing LZRW1. This includes all the dead ends too. * Files of early versions of LZRW1 in C and 68000 code dated late 1989. Maybe some printouts too. So there is a chance that we can fight this one. Maybe a lawyer could comment on the strength of the above. This is all very maddening to me. I suppose this note will be maddening to the patent holders and I hope they will not be too upset. However, I really think that it would be best if this patent was squashed as it locks up too much of the LZ77 hierarchy which, up to now, has been relatively patent free. Ross Williams ross@spam.ua.oz.au 29-Sep-1991 #------------------------------------------------------------------------- #2182704 3156012 #E/ APPARATUS AND METHOD FOR VERY HIGH DATA RATE-COMPRESSION INCORPORATING # LOSSLESS DATA COMPRESSION AND EXPANSION UTILIZING A HASHING TECHNIQUE #Document Type: UTILITY #Inventors: Gibson Dean K (US); Graybill Mark D (US) #Assignee: Intersecting Concepts Inc # Applic Applic Patent Issue # Number Date Number Date # --------- ------ ---------- ------ #Patent: US 539880 900618 US 5049881 910917 #Priority Applic: US 539880 900618 #Abstract: #A method and apparatus for compressing digital data that is represented as #a sequence of characters drawn from an alphabet. An input data block is #processed into an output data block composed of sections of variable #length. Unlike most prior art methods which emphasize the creation of a #dictionary comprised of a tree with nodes or a set of strings, the present #invention creates its own pointers from the sequence characters previously #processed and emphasizes the highest priority on maximizing the data #rate-compression factor product. The use of previously input data acting as #the dictionary combined with the use of a hashing algorithm to find #candidates for string matches and the absence of a traditional string #matching table and associated search time allows the compressor to very #quickly process the input data block. Therefore, the result is a high data #rate-compression factor product achieved due to the absence of any string #storage table and matches being tested only against one string. # #Exemplary Claim: #1. A compression method for compressing a stream of input data into a # compressed stream of output data based on a minimum number of characters # in each input data subblock to be compressed, said compression method # comprising the steps of: a. initializing a hash table and initializing # an SRC pointer; b. processing input data in the order in which the # characters in the data appear and hashing input data subblocks of the # minimum compression size selected; c. maintaining a hash table which # contains at each entry, an SRC pointer which points to a previous # subblock which hashed to this hash table entry, such that the # possibility of any string of data previously occurring in the input # block may be tested by hashing the current subblock to a hash table # entry, obtaining the previous SRC pointer contained in that entry, and # comparing the two strings of data; d. if the two strings of data match # on at least the size of the subblock, then generating a backwards # pointer to the previous occurrence of the same string of data and # thereby compressing the second occurrence of the string of data; e. if # the two strings of data do not match, then storing the string of data as # incompressible data; and f. continuing steps b. through e. until the # entire input data has been processed. # #Class: 341095000 #Class Cross Ref: 341051000; 341055000 #IPC: H03M-007/30 #------------------------------------------------------------------------- #-- #Mike Doughney mike@access.digex.com +1 301 220 2020 #Digital Express Group, 6006 Greenbelt Road #228, Greenbelt, Maryland 20770 # **** Express Access: Public Access Unix for Washington/Baltimore **** # **** For more information, mail to info@digex.com **** 13. The "Calgary Corpus" ------------------------ There is an informally standard suite of data files that is used when comparing the practical performance of text data compression algorithms. You can get it using anonymous FTP from: Machine : fsa.cpsc.ucalgary.ca [IP=136.159.2.1] Directory : /pub/text.compression.corpus/ The corpus was established by Bell, Cleary, and Witten of the University of Calgary and is used extensively in their book, my book, and the text data compression community in general. ------------- END OF THE ROSS WILLIAMS INFORMATION PACK --------------