| Solaris Kernel Address Filtering in lsof 4.50 and Above |
| |
| Current Filter |
| ============== |
| |
| Lsof revisions 4.49 and below, have exactly one filter: the kernel |
| virtual address is checked against the kernel's virtual address |
| base -- e.g., what's found in the kernel variable kernelbase. For |
| sun4m that's 0xf0000000, for sun4u, 0x10000000. |
| |
| This filter keeps lsof from handing some bad addresses to the |
| kernel, but not all bad addresses. For example, the virtual address |
| 0x657a682e passes this test on a sun4u machine, but on at least |
| one sun4u that virtual address translates to the physical address |
| 0x1cf08c30000, which is the address of a register of a qfe interface |
| on the machine. There is some evidence that a kvm_kread() call for |
| the 0x657a682e address may crash that sun4u. |
| |
| Lsof 4.71 and above use no filter if they detect that /dev/allkmem |
| exists. That is done because, when /dev/allkmem exists, /dev/kmem has |
| address filtering in its device driver. |
| |
| |
| ====================== |
| !!!IMPORTANT UPDATE!!! |
| ====================== |
| |
| In late May 2002 I learned that Sun had reports of other kernel |
| crashes, caused by adb, lsof, and mdb, related to incorrect addresses |
| being supplied to /dev/kmem. (This report was written originally |
| on July 18, 2000.) |
| |
| The problem is described in and fixed or patched: |
| |
| Solaris 7: SPARC kernel patch 106541-20 |
| Intel kernel patch 106542-20 |
| |
| Solaris 8: SPARC kernel patch 108528-14 |
| Intel kernel patch 108529-14 |
| |
| Solaris 9: bug 4344513 |
| |
| So, if you want to be comfortable using lsof (or adb or mdb) with |
| Solaris, install the appropriate Solaris 7 or 8 patches, or upgrade |
| to Solaris 9. |
| |
| Note that these patches provide the /dev/allkmem device, whose presence |
| causes lsof to rely on the address filtering of the /dev/kmem device. |
| |
| |
| New Filters |
| =========== |
| |
| Lsof 4.50 adds additional filters to the kernelbase check. The |
| filters differ, based on the Solaris version: |
| |
| Solaris |
| Version New Filters |
| ======= =========== |
| |
| 2.5 and below none |
| 2.5.1 kvm_physaddr() (-lkvm), caching, llseek(), |
| and /dev/mem |
| 2.6 kvm_physaddr() (-lkvm), caching, llseek(), |
| and /dev/mem |
| 7, 8, and 9 kvm_physaddr() (ioctl()), caching, and |
| kvm_pread() |
| |
| See !!!IMPORTANT NOTICE!! above for |
| information on a Solaris 9 bug report about, |
| or Solaris 7 and 8 kernel patches to the |
| kernel /dev/kmem driver. Those fixes |
| obviate the need for the kernel address |
| filtering described in this report. |
| |
| !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| !!! I STRONGLY RECOMMEND YOU INSTALL !!! |
| !!! THE PATCHES OR UPGRADE TO SOLARIS !!! |
| !!! 9. !!! |
| !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| |
| kvm_physaddr() (-lkvm) |
| ====================== |
| |
| Solaris has an undocumented function called kvm_physaddr() that |
| will convert a kernel virtual address to a kernel physical address. |
| (Until Solaris 7 this function doesn't even have a prototype |
| definition in <kvm.h>.) |
| |
| I have been assured repeatedly by Casper Dik of Sun that this |
| function, when given a kernel virtual address, will produce addresses |
| of physical memory only; it will not produce physical addresses of |
| interface registers, such as the one for the qfe interface. |
| |
| In Solaris 2.5.1 this function runs in application space from within |
| the KVM library. Since it needs to know the components of the |
| kernel's address space map, it must read those from kernel memory |
| each time it is called. That can be time consuming. |
| |
| I'm not sure about kvm_physaddr() for Solaris 2.6. It may still |
| run in application space from within the KVM library, but if so, |
| it is much faster than its 2.5.1 ancestor. |
| |
| kvm_physaddr() (ioctl()) |
| ======================== |
| |
| I'm sure that at Solaris 7 and above kvm_physaddr() has moved inside |
| the kernel and is called with an ioctl(). That makes it much faster |
| than its ancestors. |
| |
| kvm_physaddr() Use |
| ================== |
| |
| Lsof 4.50 for Solaris will use one or the other version of |
| kvm_physaddr() for Solaris 2.5.1, 2.6, 7, and 8. |
| |
| Using it for Solaris 2.5.1 causes lsof to take four times as much |
| real time as it formerly did with only the kernelbase filtering. |
| |
| Caching |
| ======= |
| |
| To recover the performance lost by kvm_physaddr() on Solaris 2.5.1, |
| I added virtual-to-physical address caching to lsof's kernel read |
| function, kread(). This improves Solaris 2.6, 7, and 8 performance, |
| too, but by a smaller amount. |
| |
| It turns out that a typical lsof run may require reading from 16,000 |
| or more different kernel virtual addresses. However, it also turns |
| out that those addresses are contained within about 600 distinct |
| kernel memory pages. |
| |
| To exploit this condition lsof caches each virtual page address |
| that has a corresponding legitimate physical page address for use |
| in checking later addresses. This caching regains all but a bit |
| of the performance loss on Solaris 2.5.1. |
| |
| Caching can provide some performance gain on Solaris 2.6, 7, and |
| 8, but it's not nearly as large as the gain for 2.5.1, and may |
| depend on the machine architecture type. |
| |
| /dev/mem |
| ======== |
| |
| Once lsof has kernel physical addresses, on Solaris 2.5.1 and 2.6 |
| it seeks to those addresses with llseek() and reads from them via |
| the /dev/mem device. This contrasts with lsof's pre-4.50 behavior |
| where it fed kernel virtual addresses to kvm_kread(), letting it |
| and the kernel do the virtual to physical translations -- and |
| letting that combined process crash that one unlucky sun4u via its |
| qfe interface. |
| |
| Using /dev/mem requires no more permission for lsof, but it does |
| require an additional open file descriptor and use of the 64 bit |
| llseek() function. |
| |
| The additional file descriptor is an unfortunate consequence of |
| the KVM library's opacity. The library usually has /dev/kmem open |
| to a file descriptor, but lsof can't easily get at that descriptor, |
| so it opens one of its own. |
| |
| On Solaris 2.6 for one test system, a 4 CPU E4000 sun4u, doing |
| physical kernel address reads from /dev/mem turned out to be faster |
| than using kvm_kread(). It was marginally faster on a sun4d, and |
| marginally slower on two sun4m's. |
| |
| kvm_pread() |
| =========== |
| |
| Even though it is still undocumented, the kvm_physaddr() function |
| is represented by a prototype in the Solaris 7 and 8 <kvm.h>. |
| Additionally useful is another undocumented function, kvm_pread() |
| (for physical read), that also is represented by a <kvm.h> prototype |
| in Solaris 7 and 8. |
| |
| Lsof 4.50 for Solaris 7 and 8 uses kvm_pread() instead of opening |
| a descriptor to /dev/mem, llseek()-ing to physical addresses in |
| it, and using read(2) to obtain physical address contents. The |
| bonus of kvm_pread() is two-fold: 1) it does positioning as well |
| as reading, so there's one less function call; and 2) its combined |
| operation appears to be faster than llseek() plus read() -- or even |
| kvm_kread(). |
| |
| Combined with the virtual-to-physical address caching, the performance |
| boost of kvm_pread() makes lsof faster on Solaris 7 and 8 than |
| previous revisions, using only kernelbase filtering and kvm_kread(). |
| |
| Remaining Risks |
| =============== |
| |
| There may remain some extremely small likelihood that lsof will |
| transmit a bad physical address to the kernel. Here are some |
| possible failure scenarios: |
| |
| * The physical address filters haven't been tested on |
| the machine whose qfe interface was affected. That's |
| because the machine's memory configuration was changed |
| before the test could be run. |
| |
| * The kvm_physaddr() function, especially in Solaris |
| 2.5.1, might fail to map an address correctly. Only |
| Sun can correct this problem. |
| |
| * Because lsof must read the kernel address map from |
| kernel virtual memory to pass it to the Solaris 2.5.1 |
| and 2.6 kvm_physaddr() functions, lsof must use kvm_kread() |
| to read the map. |
| |
| There's also the chance that lsof could pass a stale |
| kernel address map to kvm_physaddr(), because re-reading |
| it for each call to kvm_physaddr() would lead to |
| unacceptable performance. When in repeat mode lsof |
| re-reads the map between each cycle. |
| |
| On Solaris 7 and 8, since kvm_physaddr() is inside the |
| kernel, there's no chance of its having a stale address |
| map. |
| |
| * There's an extremely small chance that a cached |
| virtual+physical page address could become invalid. |
| This is so small I think it can be ignored, since the |
| kernel memory map rarely changes. |
| |
| When in repeat mode, lsof clears its virtual+physical |
| address map between cycles. |
| |
| * Lsof still uses Sun's kvm_getproc() (from -lkvm), and |
| I have no idea what kernel address filtering it does, |
| if any. |
| |
| I wish to acknowledge: Casper Dik of Sun, who provided information |
| about kvm_physaddr() and helped test the lsof changes; Jim Mewes |
| of Phone.com, who reported the initial problem and helped test the |
| lsof changes; and several readers of the lsof-l listserv, who |
| volunteered to run test programs. |
| |
| |
| Vic Abell |
| March 16, 2004 |