The $HARVEST_HOME/lib/gatherer directory contains the default summarizers described in Section 4.5.1, plus various utility programs needed by the summarizers and the Gatherer, as follows:
*.sum
*.unnest
*2soif
cksoif
cksoif < INPUT.soif
gatherd. cleandb ensures that all SOIF objects are valid,
and deletes any that are not;
consoldb will consolidate n GDBM database files into a single GDBM
database file; expiredb deletes any SOIF objects that are no longer
valid as defined by its Time-To-Live attribute; folddb runs all
of the operations needed to prepare the Gatherer's database for export by
gatherd; mergedb consolidates GDBM files as described in
Section 4.7.7; mkcompressed generates the compressed
cache All-Templates.gz file; mkgathererstats.pl generates the
INFO.soif statistics file;
mkindex generates the cache of timestamps; and
rmbinary removes binary data from a GDBM database.
dbcheck checks a URL to see if it has changed since the last time
it was gathered;
enum peforms a RootNode enumeration on the given URLs;
fileenum peforms a RootNode enumeration on ``file'' URLs;
ftpenum calls
ftpenum.pl to peform a RootNode enumeration on ``ftp'' URLs;
gopherenum peforms a RootNode enumeration on ``gopher'' URLs;
httpenum peforms a RootNode enumeration on ``http'' URLs;
newsenum peforms a RootNode enumeration on ``news'' URLs;
prepurls is a wrapper program used to pipe Gatherer
and essence together;
staturl retrieves LeafNode URLs so that dbcheck
can determine if the URL has been modified or not.
All of these programs are internal to Gatherer.
essence
essence [options] -f input-URLs
or essence [options] URL ...
--dbdir directory Directory to place database
--full-text Use entire file instead of summarizing
--gatherer-host Gatherer-Host value
--gatherer-name Gatherer-Name value
--gatherer-version Gatherer-Version value
--help Print usage information
--libdir directory Directory to place configuration files
--log logfile Name of the file to log messages to
--max-deletions n Number of GDBM deletions before reorganization
--minimal-bookkeeping Generates a minimal amount of bookkeeping attrs
--no-access Do not read contents of objects
--no-keywords Do not automatically generate keywords
--allowlist filename File with list of types to allow
--stoplist filename File with list of types to remove
--tmpdir directory Name of directory to use for temporary files
--type-only Only type data; do not summarize objects
--verbose Verbose output
--version Version information
extractdb, print-attr
print-attr uses stdin rather than GDBM-file.
extractdb GDBM-file Attribute
gatherd, in.gatherd
gatherd [-db | -index | -log | -zip | -cf file] [-dir dir] port
in.gatherd [-db | -index | -log | -zip | -cf file] [-dir dir]
gdbmutil
gdbmutil consolidate [-d | -D] master-file file [file ...]gdbmutil delete file keygdbmutil dump filegdbmutil fetch file keygdbmutil keys filegdbmutil print [-gatherd] filegdbmutil reorganize filegdbmutil restore filegdbmutil sort filegdbmutil stats filegdbmutil store file key < data
mktemplate, print-template
print-template can be used to ``normalize'' a SOIF stream;
it reads a stream of SOIF templates from stdin, parses them, then
writes a SOIF stream to stdout.
mktemplate < INPUT.txt > OUTPUT.soif
quick-sum
template2db
template2db database [tmpl tmpl...]
wrapit
wrapit [Attribute]