
FILE: swedish.words
VERSION: DEC-SRC-92-Apr-05

EDITOR

    Jorge Stolfi <stolfi@src.dec.com>
    DEC Systems Research Center
    
AUTHOR OF ORIGINAL WORDLIST

    Unknown.
  
DESCRIPTION

    The file swedish.words is a list of about 15,000 Swedish words.

    The file has one word per line, and is sorted with sort(1)
    in plain ASCII collating sequence.

    The file is supposed to contain all word inflections and verb
    tensens, but it is still extremely incomplete (as one can deduce
    from its size).

    Proper nouns are capitalized.  Umlauts and circle-accents are
    respectively denoted by a double quote (") and at-sign (@) after
    the modified vowel (A/O/a/o).  Besides the letters [a-zA-Z], the
    file uses only double quotes, at-sign, and newline.

AUXILIARY LISTS

    In the same directory as swedish.words you will find also:

    swedish.trash

        A list of 8744 words from the original wordlist that I 
        suspect are incorrect or do not belong in swedish.words.  

        The list consists mostly of (invalid) un-accented versions of
        accented letters.  The list also includes abbreviations,
        acronyms, computer slang, obvious typos and misspelllings,
        apparently foreign words, and several words that looked
        suspicious to me.

ORIGINAL LISTS 

    The original wordlist from which those file was compiled is listed
    below.  It was obtained by anonymous FTP on 92-Feb-10.

    [1] from: relay.cs.toronto.edu : /doc/Dictionaries
        file: words.swedish.Z
        size: 96169 bytes (200853 bytes uncompressed)

    COMMENTS: The list words.swedish.Z [1] uses the characters {}|[]\
    to represent accented letters.  However, the list also appears to
    include two additional (invalid) versions of every accented word,
    where the umlauts and circle-accents are either missing or encoded
    by digrams (ae/aa/oe/Ae/Aa/Oe). 

COMPILATION PROCESS    

    The file swedish.words is based on the the file "words.swedish"
    [1], with the characters {}|[]\ mapped to to the letter-accent
    pairs (a"/a@/o"/A"/A@/O").

    I also eliminated every word that could be an accentless version
    of an accented word. Since I don't know the language, it is
    likely that I deleted some valid words.

(NON-)COPYRIGHT STATUS

  To the best of my knowledge, all the files I used to build these
  wordlists were available for public distribution and use, at least
  for non-commercial purposes.  I have confirmed this assumption with
  the authors of the lists, whenever they were known.
  
  Therefore, it is safe to assume that the wordlists in this package
  can also be freely copied, distributed, modified, and used for
  personal, educational, and research purposes.  (Use of these files in
  commercial products may require written permission from DEC and/or
  the authors of the original lists.)
  
  Whenever you distribute any of these wordlists, please distribute
  also the accompanying README file.  If you distribute a modified
  copy of one of these wordlists, please include the original README
  file with a note explaining your modifications.  Your users will
  surely appreciate that.

(NO-)WARRANTY DISCLAIMER

  These files, like the original wordlists on which they are based,
  are still very incomplete, uneven, and inconsitent, and probably
  contain many errors.  They are offered "as is" without any warranty
  of correctness or fitness for any particular purpose.  Neither I nor
  my employer can be held responsible for any losses or damages that
  may result from their use.

