|  | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> | 
|  | <html><head> | 
|  | <!-- saved from http://www.win.tue.nl/~aeb/linux/lk/lk-10.html --> | 
|  | <meta name="GENERATOR" content="SGML-Tools 1.0.9"><title>The Linux kernel: Processes</title> | 
|  | </head> | 
|  | <body> | 
|  | <hr> | 
|  | <h2><a name="s10">10. Processes</a></h2> | 
|  |  | 
|  | <p>Before looking at the Linux implementation, first a general Unix | 
|  | description of threads, processes, process groups and sessions. | 
|  | </p><p>A session contains a number of process groups, and a process group | 
|  | contains a number of processes, and a process contains a number | 
|  | of threads. | 
|  | </p><p>A session can have a controlling tty. | 
|  | At most one process group in a session can be a foreground process group. | 
|  | An interrupt character typed on a tty ("Teletype", i.e., terminal) | 
|  | causes a signal to be sent to all members of the foreground process group | 
|  | in the session (if any) that has that tty as controlling tty. | 
|  | </p><p>All these objects have numbers, and we have thread IDs, process IDs, | 
|  | process group IDs and session IDs. | 
|  | </p><p> | 
|  | </p><h2><a name="ss10.1">10.1 Processes</a> | 
|  | </h2> | 
|  |  | 
|  | <p> | 
|  | </p><h3>Creation</h3> | 
|  |  | 
|  | <p>A new process is traditionally started using the <code>fork()</code> | 
|  | system call: | 
|  | </p><blockquote> | 
|  | <pre>pid_t p; | 
|  |  | 
|  | p = fork(); | 
|  | if (p == (pid_t) -1) | 
|  | /* ERROR */ | 
|  | else if (p == 0) | 
|  | /* CHILD */ | 
|  | else | 
|  | /* PARENT */ | 
|  | </pre> | 
|  | </blockquote> | 
|  | <p>This creates a child as a duplicate of its parent. | 
|  | Parent and child are identical in almost all respects. | 
|  | In the code they are distinguished by the fact that the parent | 
|  | learns the process ID of its child, while <code>fork()</code> | 
|  | returns 0 in the child. (It can find the process ID of its | 
|  | parent using the <code>getppid()</code> system call.) | 
|  | </p><p> | 
|  | </p><h3>Termination</h3> | 
|  |  | 
|  | <p>Normal termination is when the process does | 
|  | </p><blockquote> | 
|  | <pre>exit(n); | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | or | 
|  | <blockquote> | 
|  | <pre>return n; | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | from its <code>main()</code> procedure. It returns the single byte <code>n</code> | 
|  | to its parent. | 
|  | <p>Abnormal termination is usually caused by a signal. | 
|  | </p><p> | 
|  | </p><h3>Collecting the exit code. Zombies</h3> | 
|  |  | 
|  | <p>The parent does | 
|  | </p><blockquote> | 
|  | <pre>pid_t p; | 
|  | int status; | 
|  |  | 
|  | p = wait(&status); | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | and collects two bytes: | 
|  | <p> | 
|  | <figure> | 
|  | <eps file="absent"> | 
|  | <img src="ctty_files/exit_status.png"> | 
|  | </eps> | 
|  | </figure></p><p>A process that has terminated but has not yet been waited for | 
|  | is a <i>zombie</i>. It need only store these two bytes: | 
|  | exit code and reason for termination. | 
|  | </p><p>On the other hand, if the parent dies first, <code>init</code> (process 1) | 
|  | inherits the child and becomes its parent. | 
|  | </p><p> | 
|  | </p><h3>Signals</h3> | 
|  |  | 
|  | <p> | 
|  | </p><h3>Stopping</h3> | 
|  |  | 
|  | <p>Some signals cause a process to stop: | 
|  | <code>SIGSTOP</code> (stop!), | 
|  | <code>SIGTSTP</code> (stop from tty: probably ^Z was typed), | 
|  | <code>SIGTTIN</code> (tty input asked by background process), | 
|  | <code>SIGTTOU</code> (tty output sent by background process, and this was | 
|  | disallowed by <code>stty tostop</code>). | 
|  | </p><p>Apart from ^Z there also is ^Y. The former stops the process | 
|  | when it is typed, the latter stops it when it is read. | 
|  | </p><p>Signals generated by typing the corresponding character on some tty | 
|  | are sent to all processes that are in the foreground process group | 
|  | of the session that has that tty as controlling tty. (Details below.) | 
|  | </p><p>If a process is being traced, every signal will stop it. | 
|  | </p><p> | 
|  | </p><h3>Continuing</h3> | 
|  |  | 
|  | <p><code>SIGCONT</code>: continue a stopped process. | 
|  | </p><p> | 
|  | </p><h3>Terminating</h3> | 
|  |  | 
|  | <p><code>SIGKILL</code> (die! now!), | 
|  | <code>SIGTERM</code> (please, go away), | 
|  | <code>SIGHUP</code> (modem hangup), | 
|  | <code>SIGINT</code> (^C), | 
|  | <code>SIGQUIT</code> (^\), etc. | 
|  | Many signals have as default action to kill the target. | 
|  | (Sometimes with an additional core dump, when such is | 
|  | allowed by rlimit.) | 
|  | The signals <code>SIGCHLD</code> and <code>SIGWINCH</code> | 
|  | are ignored by default. | 
|  | All except <code>SIGKILL</code> and <code>SIGSTOP</code> can be | 
|  | caught or ignored or blocked. | 
|  | For details, see <code>signal(7)</code>. | 
|  | </p><p> | 
|  | </p><h2><a name="ss10.2">10.2 Process groups</a> | 
|  | </h2> | 
|  |  | 
|  | <p>Every process is member of a unique <i>process group</i>, | 
|  | identified by its <i>process group ID</i>. | 
|  | (When the process is created, it becomes a member of the process group | 
|  | of its parent.) | 
|  | By convention, the process group ID of a process group | 
|  | equals the process ID of the first member of the process group, | 
|  | called the <i>process group leader</i>. | 
|  | A process finds the ID of its process group using the system call | 
|  | <code>getpgrp()</code>, or, equivalently, <code>getpgid(0)</code>. | 
|  | One finds the process group ID of process <code>p</code> using | 
|  | <code>getpgid(p)</code>. | 
|  | </p><p>One may use the command <code>ps j</code> to see PPID (parent process ID), | 
|  | PID (process ID), PGID (process group ID) and SID (session ID) | 
|  | of processes. With a shell that does not know about job control, | 
|  | like <code>ash</code>, each of its children will be in the same session | 
|  | and have the same process group as the shell. With a shell that knows | 
|  | about job control, like <code>bash</code>, the processes of one pipeline, like | 
|  | </p><blockquote> | 
|  | <pre>% cat paper | ideal | pic | tbl | eqn | ditroff > out | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | form a single process group. | 
|  | <p> | 
|  | </p><h3>Creation</h3> | 
|  |  | 
|  | <p>A process <code>pid</code> is put into the process group <code>pgid</code> by | 
|  | </p><blockquote> | 
|  | <pre>setpgid(pid, pgid); | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | If <code>pgid == pid</code> or <code>pgid == 0</code> then this creates | 
|  | a new process group with process group leader <code>pid</code>. | 
|  | Otherwise, this puts <code>pid</code> into the already existing | 
|  | process group <code>pgid</code>. | 
|  | A zero <code>pid</code> refers to the current process. | 
|  | The call <code>setpgrp()</code> is equivalent to <code>setpgid(0,0)</code>. | 
|  | <p> | 
|  | </p><h3>Restrictions on setpgid()</h3> | 
|  |  | 
|  | <p>The calling process must be <code>pid</code> itself, or its parent, | 
|  | and the parent can only do this before <code>pid</code> has done | 
|  | <code>exec()</code>, and only when both belong to the same session. | 
|  | It is an error if process <code>pid</code> is a session leader | 
|  | (and this call would change its <code>pgid</code>). | 
|  | </p><p> | 
|  | </p><h3>Typical sequence</h3> | 
|  |  | 
|  | <p> | 
|  | </p><blockquote> | 
|  | <pre>p = fork(); | 
|  | if (p == (pid_t) -1) { | 
|  | /* ERROR */ | 
|  | } else if (p == 0) {    /* CHILD */ | 
|  | setpgid(0, pgid); | 
|  | ... | 
|  | } else {                /* PARENT */ | 
|  | setpgid(p, pgid); | 
|  | ... | 
|  | } | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | This ensures that regardless of whether parent or child is scheduled | 
|  | first, the process group setting is as expected by both. | 
|  | <p> | 
|  | </p><h3>Signalling and waiting</h3> | 
|  |  | 
|  | <p>One can signal all members of a process group: | 
|  | </p><blockquote> | 
|  | <pre>killpg(pgrp, sig); | 
|  | </pre> | 
|  | </blockquote> | 
|  | <p>One can wait for children in ones own process group: | 
|  | </p><blockquote> | 
|  | <pre>waitpid(0, &status, ...); | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | or in a specified process group: | 
|  | <blockquote> | 
|  | <pre>waitpid(-pgrp, &status, ...); | 
|  | </pre> | 
|  | </blockquote> | 
|  | <p> | 
|  | </p><h3>Foreground process group</h3> | 
|  |  | 
|  | <p>Among the process groups in a session at most one can be | 
|  | the <i>foreground process group</i> of that session. | 
|  | The tty input and tty signals (signals generated by ^C, ^Z, etc.) | 
|  | go to processes in this foreground process group. | 
|  | </p><p>A process can determine the foreground process group in its session | 
|  | using <code>tcgetpgrp(fd)</code>, where <code>fd</code> refers to its | 
|  | controlling tty. If there is none, this returns a random value | 
|  | larger than 1 that is not a process group ID. | 
|  | </p><p>A process can set the foreground process group in its session | 
|  | using <code>tcsetpgrp(fd,pgrp)</code>, where <code>fd</code> refers to its | 
|  | controlling tty, and <code>pgrp</code> is a process group in | 
|  | its session, and this session still is associated to the controlling | 
|  | tty of the calling process. | 
|  | </p><p>How does one get <code>fd</code>? By definition, <code>/dev/tty</code> | 
|  | refers to the controlling tty, entirely independent of redirects | 
|  | of standard input and output. (There is also the function | 
|  | <code>ctermid()</code> to get the name of the controlling terminal. | 
|  | On a POSIX standard system it will return <code>/dev/tty</code>.) | 
|  | Opening the name of the | 
|  | controlling tty gives a file descriptor <code>fd</code>. | 
|  | </p><p> | 
|  | </p><h3>Background process groups</h3> | 
|  |  | 
|  | <p>All process groups in a session that are not foreground | 
|  | process group are <i>background process groups</i>. | 
|  | Since the user at the keyboard is interacting with foreground | 
|  | processes, background processes should stay away from it. | 
|  | When a background process reads from the terminal it gets | 
|  | a SIGTTIN signal. Normally, that will stop it, the job control shell | 
|  | notices and tells the user, who can say <code>fg</code> to continue | 
|  | this background process as a foreground process, and then this | 
|  | process can read from the terminal. But if the background process | 
|  | ignores or blocks the SIGTTIN signal, or if its process group | 
|  | is orphaned (see below), then the read() returns an EIO error, | 
|  | and no signal is sent. (Indeed, the idea is to tell the process | 
|  | that reading from the terminal is not allowed right now. | 
|  | If it wouldn't see the signal, then it will see the error return.) | 
|  | </p><p>When a background process writes to the terminal, it may get | 
|  | a SIGTTOU signal. May: namely, when the flag that this must happen | 
|  | is set (it is off by default). One can set the flag by | 
|  | </p><blockquote> | 
|  | <pre>% stty tostop | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | and clear it again by | 
|  | <blockquote> | 
|  | <pre>% stty -tostop | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | and inspect it by | 
|  | <blockquote> | 
|  | <pre>% stty -a | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | Again, if TOSTOP is set but the background process ignores or blocks | 
|  | the SIGTTOU signal, or if its process group is orphaned (see below), | 
|  | then the write() returns an EIO error, and no signal is sent. | 
|  | <p> | 
|  | </p><h3>Orphaned process groups</h3> | 
|  |  | 
|  | <p>The process group leader is the first member of the process group. | 
|  | It may terminate before the others, and then the process group is | 
|  | without leader. | 
|  | </p><p>A process group is called <i>orphaned</i> when <i>the | 
|  | parent of every member is either in the process group | 
|  | or outside the session</i>. | 
|  | In particular, the process group of the session leader | 
|  | is always orphaned. | 
|  | </p><p>If termination of a process causes a process group to become | 
|  | orphaned, and some member is stopped, then all are sent first SIGHUP | 
|  | and then SIGCONT. | 
|  | </p><p>The idea is that perhaps the parent of the process group leader | 
|  | is a job control shell. (In the same session but a different | 
|  | process group.) As long as this parent is alive, it can | 
|  | handle the stopping and starting of members in the process group. | 
|  | When it dies, there may be nobody to continue stopped processes. | 
|  | Therefore, these stopped processes are sent SIGHUP, so that they | 
|  | die unless they catch or ignore it, and then SIGCONT to continue them. | 
|  | </p><p>Note that the process group of the session leader is already | 
|  | orphaned, so no signals are sent when the session leader dies. | 
|  | </p><p>Note also that a process group can become orphaned in two ways | 
|  | by termination of a process: either it was a parent and not itself | 
|  | in the process group, or it was the last element of the process group | 
|  | with a parent outside but in the same session. | 
|  | Furthermore, that a process group can become orphaned | 
|  | other than by termination of a process, namely when some | 
|  | member is moved to a different process group. | 
|  | </p><p> | 
|  | </p><h2><a name="ss10.3">10.3 Sessions</a> | 
|  | </h2> | 
|  |  | 
|  | <p>Every process group is in a unique <i>session</i>. | 
|  | (When the process is created, it becomes a member of the session | 
|  | of its parent.) | 
|  | By convention, the session ID of a session | 
|  | equals the process ID of the first member of the session, | 
|  | called the <i>session leader</i>. | 
|  | A process finds the ID of its session using the system call | 
|  | <code>getsid()</code>. | 
|  | </p><p>Every session may have a <i>controlling tty</i>, | 
|  | that then also is called the controlling tty of each of | 
|  | its member processes. | 
|  | A file descriptor for the controlling tty is obtained by | 
|  | opening <code>/dev/tty</code>. (And when that fails, there was no | 
|  | controlling tty.) Given a file descriptor for the controlling tty, | 
|  | one may obtain the SID using <code>tcgetsid(fd)</code>. | 
|  | </p><p>A session is often set up by a login process. The terminal | 
|  | on which one is logged in then becomes the controlling tty | 
|  | of the session. All processes that are descendants of the | 
|  | login process will in general be members of the session. | 
|  | </p><p> | 
|  | </p><h3>Creation</h3> | 
|  |  | 
|  | <p>A new session is created by | 
|  | </p><blockquote> | 
|  | <pre>pid = setsid(); | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | This is allowed only when the current process is not a process group leader. | 
|  | In order to be sure of that we fork first: | 
|  | <blockquote> | 
|  | <pre>p = fork(); | 
|  | if (p) exit(0); | 
|  | pid = setsid(); | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | The result is that the current process (with process ID <code>pid</code>) | 
|  | becomes session leader of a new session with session ID <code>pid</code>. | 
|  | Moreover, it becomes process group leader of a new process group. | 
|  | Both session and process group contain only the single process <code>pid</code>. | 
|  | Furthermore, this process has no controlling tty. | 
|  | <p>The restriction that the current process must not be a process group leader | 
|  | is needed: otherwise its PID serves as PGID of some existing process group | 
|  | and cannot be used as the PGID of a new process group. | 
|  | </p><p> | 
|  | </p><h3>Getting a controlling tty</h3> | 
|  |  | 
|  | <p>How does one get a controlling terminal? Nobody knows, | 
|  | this is a great mystery. | 
|  | </p><p>The System V approach is that the first tty opened by the process | 
|  | becomes its controlling tty. | 
|  | </p><p>The BSD approach is that one has to explicitly call | 
|  | </p><blockquote> | 
|  | <pre>ioctl(fd, TIOCSCTTY, 0/1); | 
|  | </pre> | 
|  | </blockquote> | 
|  |  | 
|  | to get a controlling tty. | 
|  | <p>Linux tries to be compatible with both, as always, and this | 
|  | results in a very obscure complex of conditions. Roughly: | 
|  | </p><p>The <code>TIOCSCTTY</code> ioctl will give us a controlling tty, | 
|  | provided that (i) the current process is a session leader, | 
|  | and (ii) it does not yet have a controlling tty, and | 
|  | (iii) maybe the tty should not already control some other session; | 
|  | if it does it is an error if we aren't root, or we steal the tty | 
|  | if we are all-powerful. | 
|  | [vda: correction: third parameter controls this: if 1, we steal tty from | 
|  | any such session, if 0, we don't steal] | 
|  | </p><p>Opening some terminal will give us a controlling tty, | 
|  | provided that (i) the current process is a session leader, and | 
|  | (ii) it does not yet have a controlling tty, and | 
|  | (iii) the tty does not already control some other session, and | 
|  | (iv) the open did not have the <code>O_NOCTTY</code> flag, and | 
|  | (v) the tty is not the foreground VT, and | 
|  | (vi) the tty is not the console, and | 
|  | (vii) maybe the tty should not be master or slave pty. | 
|  | </p><p> | 
|  | </p><h3>Getting rid of a controlling tty</h3> | 
|  |  | 
|  | <p>If a process wants to continue as a daemon, it must detach itself | 
|  | from its controlling tty. Above we saw that <code>setsid()</code> | 
|  | will remove the controlling tty. Also the ioctl TIOCNOTTY does this. | 
|  | Moreover, in order not to get a controlling tty again as soon as it | 
|  | opens a tty, the process has to fork once more, to assure that it | 
|  | is not a session leader. Typical code fragment: | 
|  | </p><p> | 
|  | </p><pre>        if ((fork()) != 0) | 
|  | exit(0); | 
|  | setsid(); | 
|  | if ((fork()) != 0) | 
|  | exit(0); | 
|  | </pre> | 
|  | <p>See also <code>daemon(3)</code>. | 
|  | </p><p> | 
|  | </p><h3>Disconnect</h3> | 
|  |  | 
|  | <p>If the terminal goes away by modem hangup, and the line was not local, | 
|  | then a SIGHUP is sent to the session leader. | 
|  | Any further reads from the gone terminal return EOF. | 
|  | (Or possibly -1 with <code>errno</code> set to EIO.) | 
|  | </p><p>If the terminal is the slave side of a pseudotty, and the master side | 
|  | is closed (for the last time), then a SIGHUP is sent to the foreground | 
|  | process group of the slave side. | 
|  | </p><p>When the session leader dies, a SIGHUP is sent to all processes | 
|  | in the foreground process group. Moreover, the terminal stops being | 
|  | the controlling terminal of this session (so that it can become | 
|  | the controlling terminal of another session). | 
|  | </p><p>Thus, if the terminal goes away and the session leader is | 
|  | a job control shell, then it can handle things for its descendants, | 
|  | e.g. by sending them again a SIGHUP. | 
|  | If on the other hand the session leader is an innocent process | 
|  | that does not catch SIGHUP, it will die, and all foreground processes | 
|  | get a SIGHUP. | 
|  | </p><p> | 
|  | </p><h2><a name="ss10.4">10.4 Threads</a> | 
|  | </h2> | 
|  |  | 
|  | <p>A process can have several threads. New threads (with the same PID | 
|  | as the parent thread) are started using the <code>clone</code> system | 
|  | call using the <code>CLONE_THREAD</code> flag. Threads are distinguished | 
|  | by a <i>thread ID</i> (TID). An ordinary process has a single thread | 
|  | with TID equal to PID. The system call <code>gettid()</code> returns the | 
|  | TID. The system call <code>tkill()</code> sends a signal to a single thread. | 
|  | </p><p>Example: a process with two threads. Both only print PID and TID and exit. | 
|  | (Linux 2.4.19 or later.) | 
|  | </p><pre>% cat << EOF > gettid-demo.c | 
|  | #include <unistd.h> | 
|  | #include <sys/types.h> | 
|  | #define CLONE_SIGHAND   0x00000800 | 
|  | #define CLONE_THREAD    0x00010000 | 
|  | #include <linux/unistd.h> | 
|  | #include <errno.h> | 
|  | _syscall0(pid_t,gettid) | 
|  |  | 
|  | int thread(void *p) { | 
|  | printf("thread: %d %d\n", gettid(), getpid()); | 
|  | } | 
|  |  | 
|  | main() { | 
|  | unsigned char stack[4096]; | 
|  | int i; | 
|  |  | 
|  | i = clone(thread, stack+2048, CLONE_THREAD | CLONE_SIGHAND, NULL); | 
|  | if (i == -1) | 
|  | perror("clone"); | 
|  | else | 
|  | printf("clone returns %d\n", i); | 
|  | printf("parent: %d %d\n", gettid(), getpid()); | 
|  | } | 
|  | EOF | 
|  | % cc -o gettid-demo gettid-demo.c | 
|  | % ./gettid-demo | 
|  | clone returns 21826 | 
|  | parent: 21825 21825 | 
|  | thread: 21826 21825 | 
|  | % | 
|  | </pre> | 
|  | <p> | 
|  | </p><p> | 
|  | </p><hr> | 
|  |  | 
|  | </body></html> |