| |
| A Quick Start for Lsof |
| |
| 1. Introduction |
| ================ |
| |
| Agreed, the lsof man page is dense and lsof has a plethora of |
| options. There are examples, but the manual page format buries |
| them at the end. How does one get started with lsof? |
| |
| This file is an attempt to answer that question. It plunges |
| immediately into examples of lsof use to solve problems that |
| involve looking at the open files of Unix processes. |
| |
| |
| Contents |
| |
| 1. Introduction |
| 2. Finding Uses of a Specific Open File |
| 3. Finding Open Files Filling a File System |
| a. Finding an Unlinked Open File |
| 4. Finding Processes Blocking Umount |
| 5. Finding Listening Sockets |
| 6. Finding a Particular Network Connection |
| 7. Identifying a Netstat Connection |
| 8. Finding Files Open to a Named Command |
| 9. Deciphering the Remote Login Trail |
| a. The Fundamentals |
| b. The idrlogin.perl[5] Scripts |
| 10. Watching an Ftp or Rcp Transfer |
| 11. Listing Open NFS Files |
| 12. Listing Files Open by a Specific Login |
| a. Ignoring a Specific Login |
| 13. Listing Files Open to a Specific Process Group |
| 14. When Lsof Seems to Hang |
| a. Kernel lstat(), readlink(), and stat() Blockages |
| b. Problems with /dev or /devices |
| c. Host and Service Name Lookup Hangs |
| d. UID to Login Name Conversion Delays |
| 15. Output for Other Programs |
| 16. The Lsof Exit Code and Shell Scripts |
| 17. Strange messages in the NAME column |
| |
| Options |
| |
| A. Selection Options |
| B. Output Options |
| C. Precautionary Options |
| D. Miscellaneous Lsof Options |
| |
| |
| 2. Finding Uses of a Specific Open File |
| ======================================== |
| |
| Often you're interested in knowing who is using a specific file. |
| You know the path to it and you want lsof to tell you the processes |
| that have open references to it. |
| |
| Simple -- execute lsof and give it the path name of the file of |
| interest -- e.g., |
| |
| $ lsof /etc/passwd |
| |
| Caveat: this only works if lsof has permission to get the status |
| (via stat(2)) of the file at the named path. Unless the lsof |
| process has enough authority -- e.g., it is being run with a |
| real User ID (UID) of root -- this AIX example won't work: |
| |
| Further caveat: this use of lsof will fail if the stat(2) kernel |
| syscall returns different file parameters -- particularly device |
| and inode numbers -- than lsof finds in kernel node structures. |
| This condition is rare and is usually documented in the 00FAQ |
| file of the lsof distribution. |
| |
| $ lsof /etc/security/passwd |
| lsof: status error on /etc/security/passwd: Permission denied |
| |
| |
| 3. Finding Open Files Filling a File System |
| ============================================ |
| |
| Oh! Oh! /tmp is filling and ls doesn't show that any large files |
| are being created. Can lsof help? |
| |
| Maybe. If there's a process that is writing to a file that has |
| been unlinked, lsof may be able to discover the process for you. |
| You ask it to list all open files on the file system where /tmp |
| is located. |
| |
| Sometimes /tmp is a file system by itself. In that case, |
| |
| $ lsof /tmp |
| |
| is the appropriate command. If, however, /tmp is part of another |
| file system, typically /, then you may have to ask lsof to list |
| all files open on the containing file system and locate the |
| offending file and its process by inspection -- e.g., |
| |
| $ lsof / | more |
| or |
| $ lsof / | grep ... |
| |
| Caveat: there must be a file open to a for the lsof search to |
| succeed. Sometimes the kernel may cause a file reference to |
| persist, even where there's no file open to a process. (Can you |
| say kernel bug? Maybe.) In any event, lsof won't be able to |
| help in this case. |
| |
| a. Finding an Unlinked Open File |
| ================================= |
| |
| A pesky variant of a file that is filling a file system is an |
| unlinked file to which some process is still writing. When a |
| process opens a file and then unlinks it, the file's resources |
| remain in use by the process, but the file's directory entries |
| are removed. Hence, even when you know the directory where the |
| file once resided, you can't detect it with ls. |
| |
| This can be an administrative problem when the unlinked file is |
| large, and the process that holds it open continues to write to |
| it. Only when the process closes the file will its resources, |
| particularly disk space, be released. |
| |
| Lsof can help you find unlinked files on local disks. It has an |
| option, +L, that will list the link counts of open files. That |
| helps because an unlinked file on a local disk has a zero link |
| count. Note: this is NOT true for NFS files, accessed from a |
| remote server. |
| |
| You could use the option to list all files and look for a zero |
| link count in the NLINK column -- e.g., |
| |
| $lsof +L |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME |
| ... |
| less 25366 abe txt VREG 6,0 40960 1 76319 /usr/... |
| ... |
| > less 25366 abe 3r VREG 6,0 17360 0 98768 / (/dev/sd0a) |
| |
| Better yet, you can specify an upper bound to the +L option, and |
| lsof will select only files that have a link count less than the |
| upper bound. For example: |
| |
| $ lsof +L1 |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME |
| less 25366 abe 3r VREG 6,0 17360 0 98768 / (/dev/sd0a) |
| |
| You can use lsof's -a (AND) option to narrow the link count search |
| to a particular file system. For example, to look for zero link |
| counts on the /home file system, use: |
| |
| $ lsof -a +L1 /home |
| |
| CAUTION: lsof can't always report link counts for all file types |
| -- e.g., it may not report them for FIFOs, pipes, or sockets. |
| Remember also that link counts for NFS files on an NFS client |
| host don't behave as do link counts for files on local disks. |
| |
| |
| 4. Finding Processes Blocking Umount |
| ===================================== |
| |
| When you need to unmount a file system with the umount command, |
| you may find the operation blocked by a process that has a file |
| open on the file systems. Lsof may be able to help you find the |
| process. In response to: |
| |
| $ lsof <file_system_name> |
| |
| Lsof will display all open files on the named file system. It |
| will also set its exit code zero when it finds some open files |
| and non-zero when it doesn't, making this type of lsof call |
| useful in shell scripts. (See section 16.) |
| |
| Consult the output of the df command for file system names. |
| |
| See the caveat in the preceding section about file references |
| that persist in the kernel without open file traces. That |
| situation may hamper lsof's ability to help with umount, too. |
| |
| |
| 5. Finding Listening Sockets |
| ============================= |
| |
| Sooner or later you may wonder if someone has installed a network |
| server that you don't know about. Lsof can list for you all the |
| network socket files open on your machine with: |
| |
| $ lsof -i |
| |
| The -i option without further qualification lists all open Internet |
| socket files. You can add network names or addresses, protocol |
| names, and service names or port numbers to the -i option to |
| refine the search. (See the next section.) |
| |
| |
| 6. Finding a Particular Network Connection |
| =========================================== |
| |
| When you know the source or destination of a network connection |
| whose open files and process you'd like to identify, the -i option |
| may help. |
| |
| If, for example, you want to know what process has a connection |
| open to or from the Internet host named aaa.bbb.ccc, you can ask |
| lsof to search for it with: |
| |
| $ lsof -i@aaa.bbb.ccc |
| |
| If you're interested in a particular protocol -- TCP or UDP -- |
| and a specific port number or service name, you can add those |
| discriminators to the -i information: |
| |
| $ lsof -iTCP@aaa.bbb.ccc:ftp-data |
| |
| If you're interested in a particular IP version -- IPv4 or IPv6 |
| -- and your UNIX dialect supports both (It does if "IPv[46]" |
| appears in the lsof -h output.), you can add the '4' or '6' |
| selector immediately after -i: |
| |
| $ lsof -i4 |
| $ lsof -i6 |
| |
| |
| 7. Identifying a Netstat Connection |
| ==================================== |
| |
| How do I identify the process that has a network connection |
| described in netstat output? For example, if netstat says: |
| |
| Proto Recv-Q Send-Q Local Address Foreign Address (state) |
| tcp 0 0 vic.1023 ipscgate.login ESTABLISHED |
| |
| What process is connected to service name ``login'' on ipscgate? |
| |
| Use lsof's -i option: |
| |
| $lsof -iTCP@ipscgate:login |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME |
| rlogin 25023 abe 3u inet 0x10144168 0t184 TCP lsof.itap.purdue.edu:1023->ipscgate.cc.purdue.edu:login |
| ... |
| |
| There's another way. Notice the 0x10144168 in the DEVICE column |
| of the lsof output? That's the protocol control block (PCB) |
| address. Many netstat applications will display it when given |
| the -A option: |
| |
| $ netstat -A |
| PCB Proto Recv-Q Send-Q Local Address Foreign Address (state) |
| 10144168 tcp 0 0 vic.1023 ipscgate.login ESTABLISHED |
| ... |
| |
| Using the PCB address, lsof, and grep, you can find the process this |
| way, too: |
| |
| $ lsof -i | grep 10144168 |
| rlogin 25023 abe 3u inet 0x10144168 0t184 TCP lsof.itap.purdue.edu:1023->ipscgate.cc.purdue.edu:login |
| ... |
| |
| If the file is a UNIX socket and netstat reveals and adress for it, |
| like this Solaris 11 example: |
| |
| $ netstat -a -f unix |
| Active UNIX domain sockets |
| Address Type Vnode Conn Local Addr Remote Addr |
| ffffff0084253b68 stream-ord 0000000 0000000 |
| |
| Using lsof's -U opetion and its output piped to a grep on the address |
| yields: |
| |
| $ lsof -U | grep ffffff0084253b68 |
| squid 1638 nobody 12u unix 18,98 0t10 9437188 /devices/pseudo/tl@0:ticots->0xffffff0084253b68 stream-ord |
| $ lsof -U | |
| |
| |
| 8. Finding Files Open to a Named Command |
| ========================================= |
| |
| When you want to look at the files open to a particular command, |
| you can look up the PID of the process running the command and |
| use lsof's -p option to specify it. |
| |
| $ lsof -p <PID> |
| |
| However, there's a quicker way, using lsof's -c option, provided |
| you don't mind seeing output for every process running the named |
| command. |
| |
| $ lsof -c <first_characters_of_command_name_that_interest_you> |
| |
| The lsof -c option is useful when you want to see how many instances |
| of a given command are executing and what their open files are. |
| One useful example is for the sendmail command. |
| |
| $ lsof -c sendmail |
| |
| |
| 9. Deciphering the Remote Login Trail |
| ====================================== |
| |
| If the network connection you're interested in tracing has been |
| initiated externally and is connected to an rlogind, sshd, or |
| telnetd process, asking lsof to identify that process might not |
| give a wholly satisfying answer. The report may be that the |
| connection exists, but to a process owned by root. |
| |
| a. The Fundamentals |
| ==================== |
| |
| How do you get from there to the login name really using the |
| connection? You have to know a little about how real and pseudo |
| ttys are paired in your system, and then use several lsof probes |
| to identify the login. |
| |
| This example comes from a Solaris 2.4 system, named klaatu.cc. |
| I've logged on to it via rlogin from lsof.itap. The first lsof |
| probe, |
| |
| $ lsof -i@lsof.itap |
| |
| yields (among other things): |
| |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME |
| in.rlogin 7362 root 0u inet 0xfc0193b0 0t242 TCP klaatu.cc.purdue.edu:login->lsof.itap.purdue.edu:1023 |
| ... |
| |
| This confirms that a connection exists. A second lsof probe |
| shows: |
| |
| $ lsof -p7362 |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME |
| ... |
| in.rlogin 7362 root 0u inet 0xfc0193b0 0t242 TCP klaatu.cc.purdue.edu:login->lsof.itap.purdue.edu:1023 |
| ... |
| in.rlogin 7362 root 3u VCHR 23, 0 0t66 52928 /devices/pseudo/clone@0:ptmx->pckt->ptm |
| |
| 7362 is the Process ID (PID) of the in.rlogin process, discovered |
| in the first lsof probe. (I've abbreviated the output to simplify |
| the example.) Now comes a need to understand Solaris pseudo-ttys. |
| The key indicator is in the DEVICE column for FD 3, the major/minor |
| device number of 23,0. This translates to /dev/pts/0, so a third |
| lsof probe, |
| |
| $ lsof /dev/pts/0 |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME |
| ksh 7364 abe 0u VCHR 24, 0 0t2410 53410 /dev/pts/../../devices/pseudo/pts@0:0 |
| |
| shows in part that login abe has a ksh process on /dev/pts/0. |
| (The NAME that lsof shows is not /dev/pts/0 but the full expansion |
| of the symbolic link that lsof finds at /dev/pts/0.) |
| |
| Here's a second example, done on an HP-UX 9.01 host named ghg.ecn. |
| Again, I've logged on to it from lsof.itap, so I start with: |
| |
| $ lsof -i@lsof.itap |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME |
| rlogind 10214 root 0u inet 0x041d5f00 0t1536 TCP ghg.ecn.purdue.edu:login->lsof.itap.purdue.edu:1023 |
| ... |
| |
| Then, |
| |
| $ lsof -p10214 |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME |
| ... |
| rlogind 10214 root 0u inet 0x041d5f00 0t2005 TCP ghg.ecn.purdue.edu:login->lsof.itap.purdue.edu:1023 |
| ... |
| rlogind 10214 root 3u VCHR 16,0x000030 0t2037 24642 /dev/ptym/ptys0 |
| |
| Here the key is the NAME /dev/ptym/ptys0. In HP-UX 9.01 tty and |
| pseudo tty devices are paired with the names like /dev/ptym/ptys0 |
| and /dev/pty/ttys0, so the following lsof probe is the final step. |
| |
| $ lsof /dev/pty/ttys0 |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME |
| ksh 10215 abe 0u VCHR 17,0x000030 0t3399 22607 /dev/pty/ttys0 |
| ... |
| |
| Here's a third example for an AIX 4.1.4 system. I've used telnet |
| to connect to it from lsof.itap.purdue.edu. I start with: |
| |
| $ lsof -i@lsof.itap.purdue.edu |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME |
| ... |
| telnetd 15616 root 0u inet 0x05a93400 0t5156 TCP cloud.cc.purdue.edu:telnet->lsof.itap.purdue.edu:3369 |
| |
| Then I look at the telnetd process: |
| |
| $ lsof -p15616 |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME |
| ... |
| telnetd 15616 root 0u inet 0x05a93400 0t5641 TCP cloud.cc.purdue.edu:telnet->lsof.itap.purdue.edu:3369 |
| ... |
| telnetd 15616 root 3u VCHR 25, 0 0t5493 103 /dev/ptc/0 |
| |
| Here the key is /dev/ptc/0. In AIX it's paired with /dev/pts/0. |
| The last probe for that shows: |
| |
| $ lsof /dev/pts/0 |
| COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME |
| ... |
| ksh 16642 abe 0u VCHR 26, 0 0t6461 360 /dev/pts/0 |
| |
| b. The idrlogin.perl[5] Scripts |
| ================================ |
| |
| There's another, perhaps easier way, to go about the job of |
| tracing a network connection. The lsof distribution contains |
| two Perl scripts, idrlogin.perl (Perl 4) and idrlogin.perl5 |
| (Perl 5), that use lsof field output to display values for |
| shells that are parented by rlogind, sshd, or telnetd, or |
| connected directly to TCP sockets. The lsof test suite contains |
| a C library that can be adapted for use with C programs that |
| need to call lsof and process its field output. |
| |
| The two Perl scripts use the lsof -R option; it causes the |
| paRent process ID (PPID) to be listed in the lsof output. The |
| scripts identify all shell processes -- e.g., ones whose command |
| names end in ``sh'' -- and determine if: 1) the ultimate ancestor |
| process before a PID greater than 2 (e.g., init's PID is 1) is |
| rlogind, sshd, or telnetd; or 2) the shell process has open |
| TCP socket files. |
| |
| Here's an example of output from idlogin.perl on a Solaris 2.4 |
| system: |
| |
| centurion: 1 = cd src/lsof4/scripts |
| centurion: 2 = ./idrlogin.perl |
| Login Shell PID Via PID TTY From |
| oboyle ksh 12640 in.telnetd 12638 pts/5 opal.cc.purdue.edu |
| icdtest ksh 15158 in.rlogind 15155 pts/6 localhost |
| sh csh 18207 in.rlogind 18205 pts/1 babylon5.cc.purdue.edu |
| root csh 18242 in.rlogind 18205 pts/1 babylon5.cc.purdue.edu |
| trouble ksh 19208 in.rlogind 18205 pts/1 babylon5.cc.purdue.edu |
| abe ksh 21334 in.rlogind 21332 pts/2 lsof.itap.purdue.edu |
| |
| The scripts assume that its parent directory contains an |
| executable lsof. If you decide to use one of the scripts, you |
| may want to customize it for your local lsof and perl paths. |
| |
| Note that processes executing as remote shells are also |
| identified. |
| |
| Here's another example from a UnixWare 7.1.0 system. |
| |
| tweeker: 1 = cd src/lsof4/scripts |
| tweeker: 9 = ./idrlogin.perl |
| Login Shell PID Via PID TTY From |
| abe ksh 9438 in.telnetd 9436 pts/3 lsof.itap.purdue.edu |
| |
| |
| 10. Watching an Ftp or Rcp Transfer |
| =================================== |
| |
| The nature of the Internet being one of unpredictable performance |
| at times, occasionally you want to know if a file transfer, being |
| done by ftp or rcp, is making any progress. |
| |
| To use lsof for watching a file transfer, you need to know the |
| PID of the file transfer process. You can use ps to find that. |
| Then use lsof, |
| |
| $ lsof -p<PID> |
| |
| to examine the files open to the transfer process. Usually the |
| ftp files or interest are at file descriptors 9 and 10 or 10 and |
| 11; for rcp, 3 and 4. They describe the network socket file and |
| the local data file. |
| |
| If you want to watch only those file descriptors as the file |
| transfer progresses, try these lsof forms (for ftp in the example): |
| |
| $ lsof -p<PID> -ad9,10 -r |
| or |
| $ lsof -p<PID> -ad10,11 -r |
| |
| Some options need explaining: |
| |
| -p<PID> specifies that lsof is to restrict its attention |
| to the process whose ID is <PID>. You can specify |
| a set of PIDs by separating them with commas. |
| |
| $ lsof -p 1234,5678,9012 |
| |
| -a specifies that lsof is to AND its tests together. |
| The two tests that are specified are tests on the |
| PID and tests on file descriptions (``d9,10''). |
| |
| d9,10 specifies that lsof is to test only file descriptors |
| 9 and 10. Note that the `-' is absent, since ``-a'' |
| is a unary option and can be followed immediately |
| by another lsof option. |
| |
| -r tells lsof to list the requested open file information, |
| sleep for a default 15 seconds, then list the open |
| file information again. You can specify a different |
| time (in seconds) after -r and override the default. |
| Lsof issues a short line of equal signs between |
| each set of output to distinguish it. |
| |
| For an rcp transfer, the above example becomes: |
| |
| $ lsof -p<PID> -ad3,4 -r |
| |
| |
| 11. Listing Open NFS Files |
| ========================== |
| |
| Lsof will list all files open on remote file systems, supported |
| by an NFS server. Just use: |
| |
| $ lsof -N |
| |
| Note, however, that when run on an NFS server, lsof will not list |
| files open to the server from one of its clients. That's because |
| lsof can only examine the processes running on the machine where |
| it is called -- i.e., on the NFS server. |
| |
| If you run lsof on the NFS client, using the -N option, it will |
| list files open by processes on the client that are on remote |
| NFS file systems. |
| |
| |
| 12. Listing Files Open by a Specific Login |
| ========================================== |
| |
| If you're interested in knowing what files the processes owned |
| by a particular login name have open, lsof can help. |
| |
| $ lsof -u<login> |
| or |
| $ lsof -u<User ID number> |
| |
| You can specify either the login name or the UID associated with |
| it. You can specify multiple login names and UID numbers, mixed |
| together, by separating them with commas. |
| |
| $ lsof -u548,abe |
| |
| On the subject of login names and UIDs, it's worth noting that |
| lsof can be told to report either. By default it reports login |
| names; the -l option switches reporting to UIDs. You might want |
| to use -l if login name lookup is slow for some reason. |
| |
| a. Ignoring a Specific Login |
| ============================= |
| |
| The -u option can also be used to direct lsof to ignore a |
| specific login name or UID, or a list of them. Simply prefix |
| the login names or UIDs with a `^' character, as you might do |
| in a regular expression. The `^' prefix is useful, for example, |
| when you want to have lsof ignore the files open to system |
| processes, owned by the root (UID 0) login. Try: |
| |
| $ lsof -u ^root |
| or |
| $ lsof -u ^0 |
| |
| |
| 13. Listing Files Open to a Specific Process Group |
| ================================================== |
| |
| There's a Unix collection of processes called a process group. |
| The name indicates that the processes of the group have a common |
| association and are grouped so that a signal sent to one (e.g., |
| a keyboard kill stroke) is delivered to all. |
| |
| This causes Unix to create a two element process group: |
| |
| $ lsof | less |
| |
| You can use lsof to look at the open files of all members of a |
| process group, if you know the process group ID number. Assuming |
| that it is 12717 for the above example, this lsof command: |
| |
| $ lsof -g12717 -adcwd |
| |
| would produce on a Solaris 8 system: |
| |
| $ lsof -g12717 -adcwd |
| COMMAND PID PGID USER FD TYPE DEVICE SIZE/OFF NODE NAME |
| sshd 11369 12717 root cwd VDIR 0,2 189 1449175 /tmp (swap) |
| sshd 12717 12717 root cwd VDIR 136,0 1024 2 / |
| |
| The ``-g12717'' option specifies the process group ID of interest; |
| the ``-adcwd'' option specifies that options are to be ANDed and |
| that lsof should limit file output to information about current |
| working directory (``cwd'') files. |
| |
| |
| 14. When Lsof Seems to Hang |
| =========================== |
| |
| On occasion when you run lsof it seems to hang and produce no |
| output. This may result from system conditions beyond the control |
| of lsof. Lsof has a number of options that may allow you to |
| bypass the blockage. |
| |
| a. Kernel lstat(), readlink(), and stat() Blockages |
| ==================================================== |
| |
| Lsof uses the kernel (system) calls lstat(), readlink(), and |
| stat() to locate mounted file system information. When a file |
| system has been mounted from an NFS server and that server is |
| temporarily unavailable, the calls lsof uses may block in the |
| kernel. |
| |
| Lsof will announce that it is being blocked with warning messages |
| (unless they have been suppressed by the lsof builder), but |
| only after a default waiting period of fifteen seconds has |
| expired for each file system whose server is unavailable. If |
| you have a number of such file systems, the total wait may be |
| unacceptably long. |
| |
| You can do two things to shorten your suffering: 1) reduce the |
| wait time with the -S option; or 2) tell lsof to avoid the |
| kernel calls that might block by specifying the -b option. |
| |
| $ lsof -S 5 |
| or |
| $ lsof -b |
| |
| Avoiding the kernel calls that might block may result in the |
| lack of some information that lsof needs to know about mounted |
| file systems. Thus, when you use -b, lsof warns that it might |
| lack important information. |
| |
| The warnings that result from using -b (unless suppressed by |
| the lsof builder) can themselves be annoying. You can suppress |
| them by adding the -w option. (Of course, if you do, you won't |
| know what warning messages lsof might have issued.) |
| |
| $ lsof -bw |
| |
| Note: if the lsof builder suppressed warning message issuance, |
| you don't need to use -w to suppress them. You can tell what |
| the default state of message warning issuance is by looking at |
| the -h (help) output. If it says ``-w enable warnings'' then |
| warnings are disabled by default; ``-w disable warnings'', they |
| are enabled by default. |
| |
| b. Problems with /dev or /devices |
| ================================== |
| |
| Lsof scans the /dev or /devices branch of your file system to |
| obtain information about your system's devices. (The scan isn't |
| necessary when a device cache file exists.) |
| |
| Sometimes that scan can take a very long time, especially if |
| you have a large number of devices, and if your kernel is |
| relatively slow to process the stat() system call on device |
| nodes. You can't do anything about the stat() system call |
| speed. |
| |
| However, you can make sure that lsof is allowed to use its |
| device cache file feature. When lsof can use a device cache |
| file, it retains information it gleans via the stat() calls |
| on /dev or /devices in a separate file for later, faster |
| access. |
| |
| The device cache file feature is described in the lsof man |
| page. See the DEVICE CACHE FILE, LSOF PERMISSIONS THAT AFFECT |
| DEVICE CACHE FILE ACCESS, DEVICE CACHE FILE PATH FROM THE -D |
| OPTION, DEVICE CACHE PATH FROM AN ENVIRONMENT VARIABLE, |
| SYSTEM-WIDE DEVICE CACHE PATH, PERSONAL DEVICE CACHE PATH |
| (DEFAULT), and MODIFIED PERSONAL DEVICE CACHE PATH sections. |
| |
| There is also a separate file in the lsof distribution, named |
| 00DCACHE, that describes the device cache file in detail, |
| including information about possible security problems. |
| |
| One final observation: don't overlook the possibility that your |
| /dev or /devices tree might be damaged. See if |
| |
| $ ls -R /dev |
| or |
| $ ls -R /devices |
| |
| completes or hangs. If it hangs, then lsof will probably hang, |
| too, and you should try to discover why ls hangs. |
| |
| c. Host and Service Name Lookup Hangs |
| ====================================== |
| |
| Lsof can hang up when it tries to convert an Internet dot-form |
| address to a host name, or a port number to a service name. Both |
| hangs are caused by the lookup functions of your system. |
| |
| An independent check for both types of hangs can be made with |
| the netstat program. Run it without arguments. If it hangs, |
| then it is probably having lookup difficulties. When you run |
| it with -n it shouldn't hang and should report network and port |
| numbers instead of names. |
| |
| Lsof has two options that serve the same purpose as netstat's |
| -n option. The lsof -n option tells it to avoid host name |
| lookups; and -P, service name lookups. Try those options when |
| you suspect lsof may be hanging because of lookup problems. |
| |
| $ lsof -n |
| or |
| $ lsof -P |
| or |
| $ lsof -nP |
| |
| d. UID to Login Name Conversion Delays |
| ======================================= |
| |
| By default lsof converts User IDentification (UID) numbers to |
| login names when it produces output. That conversion process |
| may sometimes hang because of system problems or interlocks. |
| |
| You can tell lsof to skip the lookup with the -l option; it |
| will then report UIDs in the USER column. |
| |
| $ lsof -l |
| |
| |
| 15. Output for Other Programs |
| ============================= |
| |
| The -F option allows you to specify that lsof should describe |
| open files with a special form of output, called field output, |
| that can be parsed easily by a subsequent program. The lsof |
| distribution comes with sample AWK, Perl 4, and Perl 5 scripts |
| that post-process field output. The lsof test suite has a C |
| library that could be adapted for use by C programs that want to |
| process lsof field output from an in-bound pipe. |
| |
| The lsof manual page describes field output in detail in its |
| OUTPUT FOR OTHER PROGRAMS section. A quick look at a sample |
| script in the scripts/ subdirectory of the lsof distribution will |
| also give you an idea how field output works. |
| |
| The most important thing about field output is that it is relatively |
| homogeneous across Unix dialects. Thus, if you write a script |
| to post-process field output for AIX, it probably will work for |
| HP-UX, Solaris, and Ultrix as well. |
| |
| |
| 16. The Lsof Exit Code and Shell Scripts |
| ======================================== |
| |
| When lsof exits successfully it returns an exit code based on |
| the result of its search for specified files. (If no files were |
| specified, then the successful exit code is 0 (zero).) |
| |
| If lsof was asked to search for specific files, including any |
| files on specified file systems, it returns an exit code of 0 |
| (zero) if it found all the specified files and at least one file |
| on each specified file system. Otherwise it returns a 1 (one). |
| |
| If lsof detects an error and makes an unsuccessful exit, it |
| returns an exit code of 1 (one). |
| |
| You can use the exit code in a shell script to search for files |
| on a file system and take action based on the result -- e.g., |
| |
| #!/bin/sh |
| lsof <file_system_name> > /dev/null 2>&1 |
| if test $? -eq 0 |
| then |
| echo "<file_system_name> has some users." |
| else |
| echo "<file_system_name> may have no users." |
| fi |
| |
| |
| 17. Strange messages in the NAME column |
| ======================================= |
| |
| When lsof encounters problems analyzing a particular file, it may |
| put a message in the file's NAME column. Many of those messages |
| are explained in the 00FAQ file of the lsof distribution. |
| |
| So consult 00FAQ first if you encounter a NAME column message you |
| don't understand. (00FAQ is a possible source of information |
| about other unfamiliar things in lsof output, too.) |
| |
| If you can't find help in 00FAQ, you can use grep to look in the |
| lsof source files for the message -- e.g., |
| |
| $ cd .../lsof_4.76_src |
| $ grep "can't identify protocol" *.[ch] |
| |
| The code associated with the message will usually make clear the |
| reason for the message. |
| |
| If you have an lsof source tree that has been processed by the |
| lsof Configure script, you need grep only there. If, however, |
| your source tree hasn't been processed by Configure, you may |
| have to look in the top-level lsof source directory and in the |
| dialects sub-directory for the UNIX dialect you are using - e.g., |
| |
| $ cd .../lsof_4.76_src |
| $ grep "can't identify protocol" *.[ch] |
| $ cd dialects/Linux |
| $ grep "can't identify protocol" *.[ch] |
| |
| In rare cases you may have to look in the lsof library, too -- |
| e.g., |
| |
| $ cd .../lsof_4.76_src |
| $ grep "can't identify protocol" *.[ch] |
| $ cd dialects/Linux |
| $ grep "can't identify protocol" *.[ch] |
| $ cd ../../lib |
| $ grep "can't identify protocol" *.[ch] |
| |
| |
| Options |
| ======= |
| |
| The following appendices describe the lsof options in detail. |
| |
| |
| A. Selection Options |
| ==================== |
| |
| Lsof has a rich set of options for selecting the files to be |
| displayed. These include: |
| |
| -a tells lsof to AND the set of selection options that |
| are specified. Normally lsof ORs them. |
| |
| For example, if you specify the -p<PID> and -u<UID> |
| options, lsof will display all files for the |
| specified PID or for the specified UID. |
| |
| By adding -a, you specify that the listed files |
| should be limited to PIDs owned by the specified |
| UIDs -- i.e., they match the PIDs *and* the UIDs. |
| |
| $ lsof -p1234 -au 5678 |
| |
| -c specifies that lsof should list files belonging |
| to processes having the associated command name. |
| |
| Hint: if you want to select files based on more than |
| one command name, use multiple -c<name> specifications. |
| |
| $ lsof -clsof -cksh |
| |
| -d tells lsof to select by the associated file descriptor |
| (FD) set. An FD set is a comma-separated list of |
| numbers and the names lsof normally displays in |
| its FD column: cwd, Lnn, ltx, <number>, etc. See |
| the OUTPUT section of the lsof man page for the |
| complete list of possible file descriptors. Example: |
| |
| $ lsof -dcwd,0,1,2 |
| |
| -g tells lsof to select by the associated process |
| group ID (PGID) set. The PGID set is a comma-separated |
| list of PGID numbers. When -g is specified, it also |
| enables the display of PGID numbers. |
| |
| Note: when -g isn't followed by a PGID set, it |
| simply selects the listing of PGID for all processes. |
| Examples: |
| |
| $ lsof -g |
| $ lsof -g1234,5678 |
| |
| -i tells lsof to display Internet socket files. If no |
| protocol/address/port specification follows -i, |
| lsof lists all Internet socket files. |
| |
| If a specification follows -i, lsof lists only the |
| socket files whose Internet addresses match the |
| specification. |
| |
| Hint: multiple addresses may be specified with |
| multiple -i options. Examples: |
| |
| $ lsof -iTCP |
| $ lsof -i@lsof.itap.purdue.edu:sendmail |
| |
| -N selects the listing of files mounted on NFS devices. |
| |
| -U selects the listing of socket files in the Unix |
| domain. |
| |
| |
| B. Output Options |
| ================== |
| |
| Lsof has these options to control its output format: |
| |
| -F produce output that can be parsed by a subsequent |
| program. |
| |
| -g print process group (PGID) IDs. |
| |
| -l list UID numbers instead of login names. |
| |
| -n list network numbers instead of host names. |
| |
| -o always list file offset. |
| |
| -P list port numbers instead of port service names. |
| |
| -s always list file size. |
| |
| |
| C. Precautionary Options |
| ========================= |
| |
| Lsof uses system functions that can block or take a long time, |
| depending on the health of the Unix dialect supporting it. These |
| include: |
| |
| -b directs lsof to avoid system functions -- e.g., |
| lstat(2), readlink(2), stat(2) -- that might block |
| in the kernel. See the BLOCKS AND TIMEOUTS |
| section of the lsof man page. |
| |
| You might want to use this option when you have |
| a mount from an NFS server that is not responding. |
| |
| -C tells lsof to ignore the kernel's name cache. As |
| a precaution this option will have little effect on |
| lsof performance, but might be useful if the kernel's |
| name cache is scrambled. (I've never seen that |
| happen.) |
| |
| -D might be used to direct lsof to ignore an existing |
| device cache file and generate a new one from /dev |
| (and /devices). This might be useful if you have |
| doubts about the integrity of an existing device |
| cache file. |
| |
| -l tells lsof to list UID numbers instead of login |
| names -- this is useful when UID to login name |
| conversion is slow or inoperative. |
| |
| -n tells lsof to avoid converting Internet addresses |
| to host numbers. This might be useful when your |
| host name lookup (e.g., DNS) is inoperative. |
| |
| -O tells lsof to avoid its strategy of forking to |
| perform potentially blocking kernel operations. |
| While the forking allows lsof to detect that a |
| block has occurred (and possibly break it), the |
| fork operation is a costly one. Use the -O option |
| with care, lest your lsof be blocked. |
| |
| -P directs lsof to list port numbers instead of trying |
| to convert them to port service names. This might |
| be useful if port to service name lookups (e.g., |
| via NIS) are slow or failing. |
| |
| -S can be used to change the lstat/readlink/stat |
| timeout interval that governs how long lsof waits |
| for response from the kernel. This might be useful |
| when an NFS server is slow or unresponsive. When |
| lsof times out of a kernel function, it may have |
| less information to display. Example: |
| |
| $ lsof -S2 |
| |
| -w tells lsof to avoid issuing warning messages, if |
| they are enabled by default, or enable them if they |
| are disabled by default. Check the -h (help) output |
| to determine their status. If it says ``-w enable |
| warnings'', then warning messages are disabled by |
| default; ``-w disable warnings'', they are enabled |
| by default. |
| |
| This may be a useful option, for example, when you |
| specify -b, if warning messages are enabled, because |
| it will suppress the warning messages lsof issues |
| about avoiding functions that might block in the |
| kernel. |
| |
| |
| D. Miscellaneous Lsof Options |
| ============================== |
| |
| There are some lsof options that are hard to classify, including: |
| |
| -? these options select help output. |
| -h |
| |
| -F selects field output. Field output is a mode where |
| lsof produces output that can be parsed easily by |
| subsequent programs -- e.g., AWK or Perl scripts. |
| See ``15. Output for Other Programs'' for more |
| information. |
| |
| -k specifies an alternate kernel symbol file -- i.e., |
| where nlist() will get its information. Example: |
| |
| $ lsof -k/usr/crash/vmunix.1 |
| |
| -m specifies an alternate kernel memory file from |
| which lsof will read kernel structures in place |
| of /dev/kmem or kvm_read(). Example: |
| |
| $ lsof -m/usr/crash/vmcore.n |
| |
| -r tells lsof to repeat its scan every 15 seconds (the |
| default when no associated value is specified). A |
| repeat time, different from the default, can follow |
| -r. Example: |
| |
| $ lsof -r30 |
| |
| -v displays information about the building of the |
| lsof executable. |
| |
| -- The double minus sign option may be used to |
| signal the end of options. It's particularly useful |
| when arguments to the last option are optional and |
| you want to supply a file path that could be confused |
| for arguments to the last option. Example: |
| |
| $ lsof -g -- 1 |
| |
| Where `1' is a file path, not PGID ID 1. |
| |
| |
| Vic Abell <abe@purdue.edu> |
| January 18, 2010 |