
Utility: tgrep (threaded recursive grep)
Author:  Ron Winacott 
Email:   ronw@canada.sun.com
Phone:   (905) 477-0437 X 340 Or (1-800) 363-6200
Fax:     (905) 477-0217
Address: 140 Renfrew Drive, 
         Suite #206,
         Markham Ontario,
         Canada. L3R 6B3
Group:   Developers Support Center (SMCC) OpCom.


*Introduction.
        
Tgrep is a multi-threaded version of grep. Tgrep supports all but the
-w (word search) options of the normal grep command, and a few options
that are only avaliable to tgrep. The real change from grep, is that
tgrep will recurse down through sub-directories and search all files
for the target string. Tgrep searches files like the following command:

find <start path> -name "<file/directory pattern>" -exec \ (Line wrapped)
	grep <options> <target> /dev/null {} \;

An example of this would be (run from this directory)

% find . -exec grep thr_create /dev/null {} \;
./Solaris/main.c:  if (thr_create(NULL,0,SigThread,NULL,THR_DAEMON,NULL)) {
./Solaris/main.c:          err = thr_create(NULL,0,cascade,(void *)work,
./Solaris/main.c:           err = thr_create(NULL,0,search_thr,(void *)work,
%
Running the same command with timex: 
real        4.26
user        0.64
sys         2.81

The same search run with tgrep would be
% tgrep thr_create
./Solaris/main.c:  if (thr_create(NULL,0,SigThread,NULL,THR_DAEMON,NULL)) {
./Solaris/main.c:          err = thr_create(NULL,0,cascade,(void *)work,
./Solaris/main.c:           err = thr_create(NULL,0,search_thr,(void *)work,
%
Running the same command with timex:
real        0.79
user        0.62
sys         1.50

Tgrep gets the results almost four times faster. The numbers above
where gathered on a SS20 running 5.5 (build 18) with 4 50MHz CPUs.




*Usage

You can also filter the files that you want tgrep to search like you
can with find. The next two commands do the same thing, just tgrep
gets it done faster.

find . -name "*.c" -exec grep thr_create /dev/null {} \;
and
tgrep -p '.*\.c$' thr_create

The -p option will allow tgrep to search only files that match the
"regular expression" file pattern string. This option does NOT use
shell expression, so to stop tgrep from seeing a file named foobar.c~
you must add the "$" meta character to the pattern and escape the real
"."  character.


Some of the other tgrep only options are -r, -C, -P, -e, -B, -S and
-Z. The -r option stops tgrep from searching any sub-directories, in
other words, search only the local directory, but -l was taken. The -C
option will search for and print "continued" lines like you find in
Makefile. Note the differences in the results of grep and tgrep run in
the current directory.


% grep SUBDIRS Makefile 
SUBDIRS= \
        @for file in ${SUBDIRS}; do \
        @for file in ${SUBDIRS}; do \
        @for file in ${SUBDIRS}; do \
% 

% tgrep -C SUBDIRS Makefile
Makefile:SUBDIRS= \
Makefile:   Common \
Makefile:   Solaris \
Makefile:   Posix
Makefile:   @for file in ${SUBDIRS}; do \
Makefile:       echo "--- Working in $$file (building ${ARCH} version)---"; \
Makefile:       (cd $$file; $(MAKECMD)) ; \
Makefile:   done
Makefile:   @for file in ${SUBDIRS}; do \
Makefile:       echo "--- Working in $$file (building $(ARCH) version)---"; \
Makefile:       (cd $$file; $(MAKECMD) clean ) ; \
Makefile:   done
Makefile:   @for file in ${SUBDIRS}; do \
Makefile:       echo "--- Working in $$file (building $(ARCH) version)---"; \
Makefile:       (cd $$file; $(MAKECMD) clobber ) ; \
Makefile:   done
%

The tgrep output prints the continued lines that ended with the "\"
character. In the case of grep I would not have seen the three values
assigned to SUBDIRS, but tgrep shows them to me (Common, Solaris,
Posix).

The -P option I use when I am sending the output of a long search to a
file and want to see the "progress" of the search. The -P option will
print a "." (dot) on stderr for every file (or groups of files
depending on the value of the -P argument) tgrep searches.

The -e option will change the way tgrep uses the target string. tgrep
uses two different patter matching systems. The first (with out the -e
option) is a literal string match call Boyer-Moore. If the -e option
is used, then a MT-Safe PD version of regular expression is used to
search for the target string as a regexp with meta characters in it. The
regular expression method is slower, but tgrep needed the
functionality. The -Z option will print help on the meta characters tgrep
uses.

The -B option tells tgrep to use the value of the environment
variable called TGLIMIT to limit the number of threads it will use
during a search. This option has no affect if TGLIMIT is not
set. Tgrep can "eat" a system alive, so the -B option was a way to run
tgrep on a system with out having other users scream at you.

The last new option is -S. If you want to see how things went while
tgrep was searching, you can use this option to print statistic about
the number of files, lines, bytes, matches, threads created, etc.

Here is an example of the -S options output. (again run in the current
directory)

% tgrep -S zimzap

----------------- Tgrep Stats. --------------------
Number of directories searched:           7
Number of files searched:                 37
Number of lines searched:                 9504
Number of matching lines to target:       0
Number of cascade threads created:        7
Number of search threads created:         20
Number of search threads from pool:       17
Search thread pool hit rate:              45.95%
Search pool overall size:                 20
Search pool size limit:                   58
Number of search threads destroyed:       0
Max # of threads running concurrenly:     20
Total run time, in seconds.               1
Work stopped due to no FD's:  (058)       0 Times, 0.00%
Work stopped due to no work on Q:         19 Times, 43.18%
Work stopped due to TGLIMITS: (Unlimited) 0 Times, 0.00%
----------------------------------------------------
%

For more information on the usage and options, see the man page
tgrep(1)



*Building Tgrep.

To build tgrep just run "make" in the top directory. This will build
two versions of tgrep, tgrep.Solaris, and tgrep.Posix (and some common
code). If you run "make install" the two versions are moved to the
local bin (./bin) directory and are renamed as tgrep.Posix.sparc, and
tgrep.Solaris.sparc if you are on sparc, or tgrep.Posix.i386
tgrep.Solaris.i386 if you are on an Intel system.

You can build either the Solaris or the POSIX versions of tgrep by its
self by changing directory into the Solaris or Posix sub-directories
and entered "make".

I used the "c" compiler version "cc: SC3.0.1 13 Jul 1994" so the only
change to the makefile.master you may need to make if the CFLAGS
option. I set the -xO4 option, but the older version of the compiler
this option is -O4 with out the "x".

I did not have access to an Intel system running 2.5 so, the intel
versions of tgrep where built using 2.4 Solaris with the EA copy of
Pthreads installed. The SPARC versions where built on 2.5 build 18.



*The Directory layout.

Common:   Holds the code used by both the Solaris threads and Posix
          threads versions of tgrep.
Makefile: The makefile to make all the code, what more can I say.
Posix:    Holds the makefile, and code to build the the POSIX version of
          tgrep.
README:   You are looking at it.
Solaris:  Holds the makefile and code needed to build the Solaris
          threads version of tgrep.
bin:      Holds the built applications
makefile.master: Defines the compiler, and other global flags used
          when building tgrep
man:      Holds the single man page for tgrep (in man1).
