A. The big picture
B. Getting more information
C. Issues related to the C library
D. Problems, weird behaviors, potential bugs
E. Missing functions, wrong types, etc
F. C++ issues
G. Debugging LinuxThreads programs
H. Compiling multithreaded code; errno madness
I. X-Windows and other libraries
J. Signals and threads
K. Internals of LinuxThreads
Multi-threaded programming differs from Unix-style multi-processing in that all threads share the same memory space (and a few other system resources, such as file descriptors), instead of running in their own memory space as is the case with Unix processes.
Threads are useful for two reasons. First, they allow a program to exploit multi-processor machines: the threads can run in parallel on several processors, allowing a single program to divide its work between several processors, thus running faster than a single-threaded program, which runs on only one processor at a time. Second, some programs are best expressed as several threads of control that communicate together, rather than as one big monolithic sequential program. Examples include server programs, overlapping asynchronous I/O, and graphical user interfaces.
There are also some online tutorials. Follow the links from the LinuxThreads web page: http://pauillac.inria.fr/~xleroy/linuxthreads.
linux-threads@magenet.com.
You can subscribe to the latter by writing
majordomo@magenet.com.
For Linux-specific questions, use comp.os.linux.development.apps and comp.os.linux.development.kernel. The latter is especially appropriate for questions relative to the interface between the kernel and LinuxThreads.
glibcbug script to mail a bug report to the glibc
maintainers. 
If you're using an older libc, or don't have the glibcbug
script on your machine, then e-mail me directly
(Xavier.Leroy@inria.fr).  
In both cases, before sending the bug report, make sure that it is not addressed already in this FAQ. Also, try to send a short program that reproduces the weird behavior you observed.
On the other hand, you probably don't want to read the standard. It's very hard to read, written in standard-ese, and targeted to implementors who already know threads inside-out. A good book on POSIX threads provides the same information in a much more readable form. I can personally recommend Dave Butenhof's book, Programming with POSIX threads (Addison-Wesley). Butenhof was part of the POSIX committee and also designed the Digital Unix implementations of POSIX threads, and it shows.
Another good source of information is the X/Open Group Single Unix specification which is available both on-line and as a book and CD/ROM. That specification includes pretty much all the POSIX standards, including 1003.1c, with some extensions and clarifications.
glibc 2 comes preinstalled as the default C library on several Linux distributions, such as RedHat 5 and up, and Debian 2. Those distributions include the version of LinuxThreads matching glibc 2.
The versions of libc 5 that work best with LinuxThreads are libc 5.2.18 on the one hand, and libc 5.4.12 or later on the other hand. Avoid 5.3.12 and 5.4.7: these have problems with the per-thread errno variable.
Switching an already installed system from libc 5 to glibc 2 is not completely straightforward. See the Glibc2 HOWTO for more information. Much easier is (re-)installing a Linux distribution based on glibc 2, such as RedHat 6.
prep.ai.mit.edu and its many, many mirrors around the world.
See http://www.gnu.org/order/ftp.html
for a list of mirrors.
ftp://sunsite.unc.edu/pub/Linux/devel/GCC/.For the libc 5 version of LinuxThreads, see ftp://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy/linuxthreads/.
linuxthreads/ subdirectory, then recompile
glibc as a whole.  There are now too many inter-dependencies between
LinuxThreads and glibc 2 to allow separate re-compilation of LinuxThreads.
| LinuxThreads | C library | RedHat | 
| 0.7, 0.71 (for libc 5) | libc 5.x | RH 4.2 | 
| 0.7, 0.71 (for glibc 2) | glibc 2.0.x | RH 5.x | 
| 0.8 | glibc 2.1.1 | RH 6.0 | 
| 0.8 | glibc 2.1.2 | not yet released | 
libc_r/dirent.c
        libc_r/dirent.c:94: structure has no member named `dd_lock'
I haven't actually seen this problem, but several users reported it.
My understanding is that something is wrong in the include files of
your Linux installation (/usr/include/*). Make sure
you're using a supported version of the libc 5 library. (See question C.2).
/usr/include/sched.h: there are several occurrences of
_p that the C compiler does not understand/usr/include/sched.h that comes with libc 5.3.12 is broken.
Replace it with the sched.h file contained in the
LinuxThreads distribution.  But really you should not be using libc
5.3.12 with LinuxThreads! (See question C.1.)
fdopen() on a file
descriptor opened on a pipe.  When I link it with LinuxThreads,
fdopen() always returns NULL!
pthread_create() no longer returns!
top or ps
display N+2 processes that are running my program. What do all these
processes correspond to?pthread_create.  That leaves one process
unaccounted for.  That extra process corresponds to the "thread
manager" thread, a thread created internally by LinuxThreads to handle
thread creation and thread termination.  This extra thread is asleep
most of the time.
In LinuxThreads 0.8 and up, pthread_unlock restarts only
one waiting thread, and pre-assign the mutex to that thread.  Hence,
if the thread that unlocked the mutex tries to lock it again
immediately, it will block until other waiting threads have had a
chance to lock and unlock the mutex.  This results in much fairer
scheduling.
Notice however that even the old "unfair" behavior is perfectly
acceptable with respect to the POSIX standard: for the default
scheduling policy, POSIX makes no guarantees of fairness, such as "the
thread waiting for the mutex for the longest time always acquires it
first".  Properly written multithreaded code avoids that kind of heavy
contention on mutexes, and does not run into fairness problems.  If
you need scheduling guarantees, you should consider using the
real-time scheduling policies SCHED_RR and
SCHED_FIFO, which have precisely defined scheduling
behaviors. 
printf() in tight loops, and from the
printout it seems that only one thread is running, the other doesn't
print anything!
The reason for this behavior is explained in
question D.6 above: printf() performs
locking on stdout, and thus your two threads contend very
heavily for the mutex associated with stdout.  But if you
do some real work between two calls to printf(), you'll
see that scheduling becomes much smoother.
<pthread.h>
and there seems to be a gross error in the pthread_cleanup_push
macro: it opens a block with { but does not close it!
Surely you forgot a } at the end of the macro, right?
pthread_cleanup_pop macro.  The POSIX standard
requires pthread_cleanup_push and
pthread_cleanup_pop to be used in matching pairs, at the
same level of brace nesting.  This allows
pthread_cleanup_push to open a block in order to
stack-allocate some data structure, and
pthread_cleanup_pop to close that block.  It's ugly, but
it's the standard way of implementing cleanup handlers.
The problem is fixed in LinuxThreads 0.8.
NR_TASKS in include/linux/tasks.h
and recompiling the kernel.  On the x86 processors at least,
architectural constraints seem to limit NR_TASKS to 4090
at most.
PTHREAD_THREADS_MAX in the LinuxThreads sources
and recompile.
setstackaddr
attribute.
pthread_yield() ? How
comes LinuxThreads does not implement it?pthread_yield(),
but then the POSIX guys discovered it was redundant with
sched_yield() and dropped it.  So, just use
sched_yield() instead.
<pthread.h>.
For instance, the second argument to pthread_create()
should be a pthread_attr_t, not a
pthread_attr_t *. Also, didn't you forget to declare
pthread_attr_default?
thr_blah to
pthread_blah.  This is very annoying.  Why did you change
all the function names?thr_* functions correspond to Solaris
threads, an older thread interface that you'll find only under
Solaris.  The pthread_* functions correspond to POSIX
threads, an international standard available for many, many platforms.
Even Solaris 2.5 and later support the POSIX threads interface.  So,
do yourself a favor and rewrite your code to use POSIX threads: this
way, it will run unchanged under Linux, Solaris, and quite a lot of
other platforms.
thr_suspend() and
thr_resume() functions to do that; why don't you?
Notice that thr_suspend() is inherently dangerous and
prone to race conditions.  For one thing, there is no control on where
the target thread stops: it can very well be stopped in the middle of
a critical section, while holding mutexes.  Also, there is no
guarantee on when the target thread will actually stop.  For these
reasons, you'd be much better off using mutexes and conditions
instead.  The only situations that really require the ability to
suspend a thread are debuggers and some kind of garbage collectors.
If you really must suspend a thread in LinuxThreads, you can send it a
SIGSTOP signal with pthread_kill. Send
SIGCONT for restarting it.
Beware, this is specific to LinuxThreads and entirely non-portable.
Indeed, a truly conforming POSIX threads implementation will stop all
threads when one thread receives the SIGSTOP signal!
One day, LinuxThreads will implement that behavior, and the
non-portable hack with SIGSTOP won't work anymore.
pthread_attr_setstacksize() and
pthread_attr_setstackaddr()?
Even if pthread_attr_setstacksize() and
pthread_attr_setstackaddr() are now provided, we still
recommend that you do not use them unless you really have strong
reasons for doing so.  The default stack allocation strategy for
LinuxThreads is nearly optimal: stacks start small (4k) and
automatically grow on demand to a fairly large limit (2M).
Moreover, there is no portable way to estimate the stack requirements
of a thread, so setting the stack size yourself makes your program
less reliable and non-portable.
PTHREAD_SCOPE_PROCESS value of the "contentionscope"
attribute.  Why? PTHREAD_SCOPE_PROCESS.
_POSIX_THREAD_PROCESS_SHARED
before using this facility.
The goal of this extension is to allow different processes (with
different address spaces) to synchronize through mutexes, conditions
or semaphores allocated in shared memory (either SVR4 shared memory
segments or mmap()ed files).
The reason why this does not work in LinuxThreads is that mutexes, conditions, and semaphores are not self-contained: their waiting queues contain pointers to linked lists of thread descriptors, and these pointers are meaningful only in one address space.
Matt Messier and I spent a significant amount of time trying to design a suitable mechanism for sharing waiting queues between processes. We came up with several solutions that combined two of the following three desirable features, but none that combines all three:
pthread_cond_timedwait
clone()" fails.
Until suitable kernel support is available, you'd better use traditional interprocess communications to synchronize different processes: System V semaphores and message queues, or pipes, or sockets.
pthread_create() !pthread_create().
Recall that pthread_create() is a C function, and it must
be passed a C function as third argument.
If you want to use thread, I can only suggest egcs and glibc. You can find egcs at http://www.cygnus.com/egcs. egcs has libsdtc++, which is MT safe under glibc 2. If you really want to use the libg++, I have a libg++ add-on for egcs.
http://www.gr.opengroup.org/java/jdk/linux/debug.htm.
Precompiled binaries of the patched gdb are available in RedHat's RPM
format at http://odin.appliedtheory.com/.Some Linux distributions provide an already-patched version of gdb; others don't. For instance, the gdb in RedHat 5.2 is thread-aware, but apparently not the one in RedHat 6.0. Just ask (politely) the makers of your Linux distributions to please make sure that they apply the correct patches to gdb.
printf() are your best friends.  Try to debug
sequential parts in a single-threaded program first.  Then, put
printf() statements all over the place to get execution traces.
Also, check invariants often with the assert() macro.  In truth,
there is no other effective way (save for a full formal proof of your
program) to track down concurrency bugs.  Debuggers are not really
effective for subtle concurrency problems, because they disrupt
program execution too much.
_REENTRANT defined. What difference does it make?gethostbyname_r() as a reentrant equivalent to
gethostbyname().
_REENTRANT is defined, some
<stdio.h> functions are no longer defined as macros,
e.g. getc() and putc(). In a multithreaded
program, stdio functions require additional locking, which the macros
don't perform, so we must call functions instead.
<errno.h> redefines errno when
_REENTRANT is
defined, so that errno refers to the thread-specific errno location
rather than the global errno variable.  This is achieved by the
following #define in <errno.h>:
        #define errno (*(__errno_location()))
which causes each reference to errno to call the
__errno_location() function for obtaining the location
where error codes are stored.  libc provides a default definition of
__errno_location() that always returns
&errno (the address of the global errno variable). Thus,
for programs not linked with LinuxThreads, defining
_REENTRANT makes no difference w.r.t. errno processing.
But LinuxThreads redefines __errno_location() to return a
location in the thread descriptor reserved for holding the current
value of errno for the calling thread.  Thus, each thread operates on
a different errno location.
-D_REENTRANT?getc() or
putc(), it will perform I/O without proper interlocking
of the stdio buffers; this can cause lost output, duplicate output, or
just crash other stdio functions.  If the code consults errno, it will
get back the wrong error code.  The following code fragment is a
typical example:
        do {
          r = read(fd, buf, n);
          if (r == -1) {
            if (errno == EINTR)   /* an error we can handle */
              continue;
            else {                /* other errors are fatal */
              perror("read failed");
              exit(100);
            }
          }
        } while (...);
Assume this code is not compiled with -D_REENTRANT, and
linked with LinuxThreads.  At run-time, read() is
interrupted.  Since the C library was compiled with
-D_REENTRANT, read() stores its error code
in the location pointed to by __errno_location(), which
is the thread-local errno variable.  Then, the code above sees that
read() returns -1 and looks up errno.  Since
_REENTRANT is not defined, the reference to errno
accesses the global errno variable, which is most likely 0.  Hence the
code concludes that it cannot handle the error and stops.
SIGUSR1 and SIGUSR2 in my programs! Why? LinuxThreads needs two signals for its internal operation. One is used to suspend and restart threads blocked on mutex, condition or semaphore operations. The other is used for thread cancellation.
On ``old'' kernels (2.0 and early 2.1 kernels), there are only 32
signals available and the kernel reserves all of them but two:
SIGUSR1 and SIGUSR2.  So, LinuxThreads has
no choice but use those two signals.
On recent kernels (2.2 and up), more than 32 signals are provided in
the form of realtime signals. When run on one of those kernels,
LinuxThreads uses two reserved realtime signals for its internal
operation, thus leaving SIGUSR1 and SIGUSR2
free for user code.  (This works only with glibc, not with libc 5.) 
So, you can take the address of an "auto" variable and pass it to other threads via shared data structures. However, you need to make absolutely sure that the function doing this will not return as long as other threads need to access this address. It's the usual mistake of returning the address of an "auto" variable, only made much worse because of concurrency. It's much, much safer to systematically heap-allocate all shared data structures.
-D_REENTRANT.  It happens Xlib contains a
piece of code very much like the one in question H.2.  So, your Xlib fetches the error code from the
wrong errno location and concludes that an error it cannot handle
occurred.
You can try to recompile yourself the X libraries with multithreading
options set.  They contain optional support for multithreading; it's
just that the binaries provided by your Linux distribution were built
without this support.  See the file README.Xfree3.3 in
the LinuxThreads distribution for patches and info on how to compile
thread-safe X libraries from the Xfree3.3 distribution.  The Xfree3.3
sources are readily available in most Linux distributions, e.g. as a
source RPM for RedHat.  Be warned, however, that X Windows is a huge
system, and recompiling even just the libraries takes a lot of time
and disk space.
Another, less involving solution is to call X functions only from the
main thread of your program.  Even if all threads have their own errno
location, the main thread uses the global errno variable for its errno
location.  Thus, code not compiled with -D_REENTRANT
still "sees" the right error values if it executes in the main thread
only. 
-D_REENTRANT to avoid
the errno problems explained in question H.2.
-D_REENTRANT is needed.
-D_REENTRANT.
SIGUSR1 and SIGUSR2.  See question H.4.
sigaction(), it sets how the signal is handled not only
for itself, but for all other threads in the program as well.
On the other hand, signal masks are per-thread: each thread chooses
which signals it blocks independently of others.  At thread creation
time, the newly created thread inherits the signal mask of the thread
calling pthread_create().  But afterwards, the new thread
can modify its signal mask independently of its creator thread.
SIGKILL to a
particular thread using pthread_kill, all my threads are
killed!SIGKILL or SIGINT
when no handler is installed on that signal).  This behavior makes a
lot of sense: when you type "ctrl-C" at the keyboard, or when a thread
crashes on a division by zero or a segmentation fault, you really want
all threads to stop immediately, not just the one that caused the
segmentation violation or that got the SIGINT signal.
(This assumes default behavior for those signals; see question
J.3 if you install handlers for those signals.)
If you're trying to terminate a thread without bringing the whole
process down, use pthread_cancel().
SIGFPE signal), then the handler is executed by that
thread.  This also applies to signals generated by
raise().
If the signal is sent to a particular thread using
pthread_kill(), then that thread executes the handler.
If the signal is sent via kill() or the tty interface
(e.g. by pressing ctrl-C), then the POSIX specs say that the handler
is executed by any thread in the process that does not currently block
the signal.  In other terms, POSIX considers that the signal is sent
to the process (the collection of all threads) as a whole, and any
thread that is not blocking this signal can then handle it.
The latter case is where LinuxThreads departs from the POSIX specs. In LinuxThreads, there is no real notion of ``the process as a whole'': in the kernel, each thread is really a distinct process with a distinct PID, and signals sent to the PID of a thread can only be handled by that thread. As long as no thread is blocking the signal, the behavior conforms to the standard: one (unspecified) thread of the program handles the signal. But if the thread to which PID the signal is sent blocks the signal, and some other thread does not block the signal, then LinuxThreads will simply queue in that thread and execute the handler only when that thread unblocks the signal, instead of executing the handler immediately in the other thread that does not block the signal.
This is to be viewed as a LinuxThreads bug, but I currently don't see any way to implement the POSIX behavior without kernel support.
pthread_* functions are not async-signal safe, meaning
that you should not call them from signal handlers.  This
recommendation is not to be taken lightly: your program can deadlock
if you call a pthread_* function from a signal handler!
The only sensible things you can do from a signal handler is set a
global flag, or call sem_post on a semaphore, to record
the delivery of the signal.  The remainder of the program can then
either poll the global flag, or use sem_wait() and
sem_trywait() on the semaphore.
Another option is to do nothing in the signal handler, and dedicate
one thread (preferably the initial thread) to wait synchronously for
signals, using sigwait(), and send messages to the other
threads accordingly.
sigwait(), other threads no longer receive the signals
sigwait() is waiting for!  What happens? sigwait().  Basically, it installs signal handlers on all
signals waited for, in order to record which signal was received.
Since signal handlers are shared with the other threads, this
temporarily deactivates any signal handlers you might have previously
installed on these signals.
Though surprising, this behavior actually seems to conform to the
POSIX standard.  According to POSIX, sigwait() is
guaranteed to work as expected only if all other threads in the
program block the signals waited for (otherwise, the signals could be
delivered to other threads than the one doing sigwait(),
which would make sigwait() useless).  In this particular
case, the problem described in this question does not appear.
One day, sigwait() will be implemented in the kernel,
along with others POSIX 1003.1b extensions, and sigwait()
will have a more natural behavior (as well as better performances).
clone() system call, which is a generalization of
fork() allowing the new process to share the memory
space, file descriptors, and signal handlers of the parent.Advantages of the "one-to-one" model include:
The "many-to-many" model combines both kernel-level and user-level scheduling: several kernel-level threads run concurrently, each executing a user-level scheduler that selects between user threads. Most commercial Unix systems (Solaris, Digital Unix, IRIX) implement POSIX threads this way. This model combines the advantages of both the "many-to-one" and the "one-to-one" model, and is attractive because it avoids the worst-case behaviors of both models -- especially on kernels where context switches are expensive, such as Digital Unix. Unfortunately, it is pretty complex to implement, and requires kernel support which Linux does not provide. Linus Torvalds and other Linux kernel developers have always been pushing the "one-to-one" model in the name of overall simplicity, and are doing a pretty good job of making kernel-level context switches between threads efficient. LinuxThreads is just following the general direction they set.