| <?xml version="1.0" encoding="ISO-8859-1"?> |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> |
| <title>OProfile manual</title> |
| <meta name="generator" content="DocBook XSL Stylesheets V1.75.2" /> |
| </head> |
| <body> |
| <div class="book" title="OProfile manual"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h1 class="title"><a id="oprofile-guide"></a>OProfile manual</h1> |
| </div> |
| <div> |
| <div class="authorgroup"> |
| <div class="author"> |
| <h3 class="author"><span class="firstname">John</span> <span class="surname">Levon</span></h3> |
| <div class="affiliation"> |
| <div class="address"> |
| <p> |
| <code class="email"><<a class="email" href="mailto:levon@movementarian.org">levon@movementarian.org</a>></code> |
| </p> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div> |
| <p class="copyright">Copyright © 2000-2004 Victoria University of Manchester, John Levon and others</p> |
| </div> |
| </div> |
| <hr /> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="chapter"> |
| <a href="#introduction">1. Introduction</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#applications">1. Applications of OProfile</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#jitsupport">1.1. Support for dynamically compiled (JIT) code</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#requirements">2. System requirements</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#resources">3. Internet resources</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#install">4. Installation</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#uninstall">5. Uninstalling OProfile</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="chapter"> |
| <a href="#overview">2. Overview</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#getting-started">1. Getting started</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#tools-overview">2. Tools summary</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="chapter"> |
| <a href="#controlling">3. Controlling the profiler</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#controlling-daemon">1. Using <span class="command"><strong>opcontrol</strong></span></a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opcontrolexamples">1.1. Examples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#eventspec">1.2. Specifying performance counter events</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#setup-jit">2. Setting up the JIT profiling feature</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#setup-jit-jvm">2.1. JVM instrumentation</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#oprofile-gui">3. Using <span class="command"><strong>oprof_start</strong></span></a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#detailed-parameters">4. Configuration details</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#hardware-counters">4.1. Hardware performance counters</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#rtc">4.2. OProfile in RTC mode</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#timer">4.3. OProfile in timer interrupt mode</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#p4">4.4. Pentium 4 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#ia64">4.5. Intel Itanium 2 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#ppc64">4.6. PowerPC64 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#cell-be">4.7. Cell Broadband Engine support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#amd-ibs-support">4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#misuse">4.9. Dangerous counter settings</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| </dl> |
| </dd> |
| <dt> |
| <span class="chapter"> |
| <a href="#results">4. Obtaining results</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#profile-spec">1. Profile specifications</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#profile-spec-examples">1.1. Examples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#profile-spec-details">1.2. Profile specification parameters</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#locating-and-managing-binary-images">1.3. Locating and managing binary images</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#no-results">1.4. What to do when you don't get any results</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opreport">2. Image summaries and symbol summaries (<span class="command"><strong>opreport</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-merging">2.1. Merging separate profiles</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-comparison">2.2. Side-by-side multiple results</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-callgraph">2.3. Callgraph output</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-diff">2.4. Differential profiles with <span class="command"><strong>opreport</strong></span></a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-anon">2.5. Anonymous executable mappings</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-xml">2.6. XML formatted output</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-options">2.7. Options for <span class="command"><strong>opreport</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opannotate">3. Outputting annotated source (<span class="command"><strong>opannotate</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opannotate-finding-source">3.1. Locating source files</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opannotate-details">3.2. Usage of <span class="command"><strong>opannotate</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#getting-jit-reports">4. OProfile results with JIT samples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#opgprof">5. <span class="command"><strong>gprof</strong></span>-compatible output (<span class="command"><strong>opgprof</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opgprof-details">5.1. Usage of <span class="command"><strong>opgprof</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#oparchive">6. Archiving measurements (<span class="command"><strong>oparchive</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#oparchive-details">6.1. Usage of <span class="command"><strong>oparchive</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opimport">7. Converting sample database files (<span class="command"><strong>opimport</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opimport-details">7.1. Usage of <span class="command"><strong>opimport</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| </dl> |
| </dd> |
| <dt> |
| <span class="chapter"> |
| <a href="#interpreting">5. Interpreting profiling results</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#irq-latency">1. Profiling interrupt latency</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#kernel-profiling">2. Kernel profiling</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#irq-masking">2.1. Interrupt masking</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#idle">2.2. Idle time</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#kernel-modules">2.3. Profiling kernel modules</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#interpreting-callgraph">3. Interpreting call-graph profiles</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#debug-info">4. Inaccuracies in annotated source</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#effect-of-optimizations">4.1. Side effects of optimizations</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#prologues">4.2. Prologues and epilogues</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#inlined-function">4.3. Inlined functions</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#wrong-linenr-info">4.4. Inaccuracy in line number information</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#symbol-without-debug-info">5. Assembly functions</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#overlapping-symbols">6. Overlapping symbols in JITed code</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#hidden-cost">7. Other discrepancies</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="chapter"> |
| <a href="#ack">6. Acknowledgments</a> |
| </span> |
| </dt> |
| </dl> |
| </div> |
| <div class="chapter" title="Chapter 1. Introduction"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title"><a id="introduction"></a>Chapter 1. Introduction</h2> |
| </div> |
| </div> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#applications">1. Applications of OProfile</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#jitsupport">1.1. Support for dynamically compiled (JIT) code</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#requirements">2. System requirements</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#resources">3. Internet resources</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#install">4. Installation</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#uninstall">5. Uninstalling OProfile</a> |
| </span> |
| </dt> |
| </dl> |
| </div> |
| <p> |
| This manual applies to OProfile version 0.9.7-rc2. |
| OProfile is a profiling system for Linux 2.2/2.4/2.6 systems on a number of architectures. It is capable of profiling |
| all parts of a running system, from the kernel (including modules and interrupt handlers) to shared libraries |
| to binaries. It runs transparently in the background collecting information at a low overhead. These |
| features make it ideal for profiling entire systems to determine bottle necks in real-world systems. |
| </p> |
| <p> |
| Many CPUs provide "performance counters", hardware registers that can count "events"; for example, |
| cache misses, or CPU cycles. OProfile provides profiles of code based on the number of these occurring events: |
| repeatedly, every time a certain (configurable) number of events has occurred, the PC value is recorded. |
| This information is aggregated into profiles for each binary image.</p> |
| <p> |
| Some hardware setups do not allow OProfile to use performance counters: in these cases, no |
| events are available, and OProfile operates in timer/RTC mode, as described in later chapters. |
| </p> |
| <div class="sect1" title="1. Applications of OProfile"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="applications"></a>1. Applications of OProfile</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| OProfile is useful in a number of situations. You might want to use OProfile when you : |
| </p> |
| <div class="itemizedlist"> |
| <ul class="itemizedlist" type="disc"> |
| <li class="listitem"> |
| <p>need low overhead</p> |
| </li> |
| <li class="listitem"> |
| <p>cannot use highly intrusive profiling methods</p> |
| </li> |
| <li class="listitem"> |
| <p>need to profile interrupt handlers</p> |
| </li> |
| <li class="listitem"> |
| <p>need to profile an application and its shared libraries</p> |
| </li> |
| <li class="listitem"> |
| <p>need to profile dynamically compiled code of supported virtual machines (see <a class="xref" href="#jitsupport" title="1.1. Support for dynamically compiled (JIT) code">Section 1.1, “Support for dynamically compiled (JIT) code”</a>)</p> |
| </li> |
| <li class="listitem"> |
| <p>need to capture the performance behaviour of entire system</p> |
| </li> |
| <li class="listitem"> |
| <p>want to examine hardware effects such as cache misses</p> |
| </li> |
| <li class="listitem"> |
| <p>want detailed source annotation</p> |
| </li> |
| <li class="listitem"> |
| <p>want instruction-level profiles</p> |
| </li> |
| <li class="listitem"> |
| <p>want call-graph profiles</p> |
| </li> |
| </ul> |
| </div> |
| <p> |
| OProfile is not a panacea. OProfile might not be a complete solution when you : |
| </p> |
| <div class="itemizedlist"> |
| <ul class="itemizedlist" type="disc"> |
| <li class="listitem"> |
| <p>require call graph profiles on platforms other than 2.6/x86</p> |
| </li> |
| <li class="listitem"> |
| <p>don't have root permissions</p> |
| </li> |
| <li class="listitem"> |
| <p>require 100% instruction-accurate profiles</p> |
| </li> |
| <li class="listitem"> |
| <p>need function call counts or an interstitial profiling API</p> |
| </li> |
| <li class="listitem"> |
| <p>cannot tolerate any disturbance to the system whatsoever</p> |
| </li> |
| <li class="listitem"> |
| <p>need to profile interpreted or dynamically compiled code of non-supported virtual machines</p> |
| </li> |
| </ul> |
| </div> |
| <div class="sect2" title="1.1. Support for dynamically compiled (JIT) code"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="jitsupport"></a>1.1. Support for dynamically compiled (JIT) code</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Older versions of OProfile were not capable of attributing samples to symbols from dynamically |
| compiled code, i.e. "just-in-time (JIT) code". Typical JIT compilers load the JIT code into |
| anonymous memory regions. OProfile reported the samples from such code, but the attribution |
| provided was simply: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">"anon: <tgid><address range>" </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Due to this limitation, it wasn't possible to profile applications executed by virtual machines (VMs) |
| like the Java Virtual Machine. OProfile now contains an infrastructure to support JITed code. |
| A development library is provided to allow developers |
| to add support for any VM that produces dynamically compiled code (see the <span class="emphasis"><em>OProfile JIT agent |
| developer guide</em></span>). |
| In addition, built-in support is included for the following:</p> |
| <div class="itemizedlist"> |
| <ul class="itemizedlist" type="disc"> |
| <li class="listitem">JVMTI agent library for Java (1.5 and higher)</li> |
| <li class="listitem">JVMPI agent library for Java (1.5 and lower)</li> |
| </ul> |
| </div> |
| <p> |
| For information on how to use OProfile's JIT support, see <a class="xref" href="#setup-jit" title="2. Setting up the JIT profiling feature">Section 2, “Setting up the JIT profiling feature”</a>. |
| </p> |
| </div> |
| </div> |
| <div class="sect1" title="2. System requirements"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="requirements"></a>2. System requirements</h2> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term">Linux kernel 2.2/2.4/2.6</span> |
| </dt> |
| <dd> |
| <p> |
| OProfile uses a kernel module that can be compiled for |
| 2.2.11 or later and 2.4. 2.4.10 or above is required if you use the |
| boot-time kernel option <code class="option">nosmp</code>. 2.6 kernels are supported with the in-kernel |
| OProfile driver. Note that only 32-bit x86 and IA64 are supported on 2.2/2.4 kernels. |
| </p> |
| <p> |
| 2.6 kernels are strongly recommended. Under 2.4, OProfile may cause system crashes if power |
| management is used, or the BIOS does not correctly deal with local APICs. |
| </p> |
| <p> |
| To use OProfile's JIT support, a kernel version 2.6.13 or later is required. |
| In earlier kernel versions, the anonymous memory regions are not reported to OProfile and results |
| in profiling reports without any samples in these regions. |
| </p> |
| <p> |
| PPC64 processors (Power4/Power5/PPC970, etc.) require a recent (> 2.6.5) kernel with the line |
| <code class="constant">#define PV_970</code> present in <code class="filename">include/asm-ppc64/processor.h</code>. |
| |
| </p> |
| <p> |
| Profiling the Cell Broadband Engine PowerPC Processing Element (PPE) requires a kernel version |
| of 2.6.18 or more recent. |
| Profiling the Cell Broadband Engine Synergistic Processing Element (SPE) requires a kernel version |
| of 2.6.22 or more recent. Additionally, full support of SPE profiling requires a BFD library |
| from binutils code dated January 2007 or later. To ensure the proper BFD support exists, run |
| the <code class="code">configure</code> utility with <code class="code">--with-target=cell-be</code>. |
| |
| Profiling the Cell Broadband Engine using SPU events requires a kernel version of 2.6.29-rc1 |
| or more recent. |
| |
| </p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>Attempting to profile SPEs with kernel versions older than 2.6.22 may cause the |
| system to crash.</div> |
| <p> |
| </p> |
| <p> |
| Instruction-Based Sampling (IBS) profile on AMD family10h processors requires |
| kernel version 2.6.28-rc2 or later. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">modutils 2.4.6 or above</span> |
| </dt> |
| <dd> |
| <p> |
| You should have installed modutils 2.4.6 or higher (in fact earlier versions work well in almost all |
| cases). |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Supported architecture</span> |
| </dt> |
| <dd> |
| <p> |
| For Intel IA32, a CPU with either a P6 generation or Pentium 4 core is |
| required. In marketing terms this translates to anything |
| between an Intel Pentium Pro (not Pentium Classics) and |
| a Pentium 4 / Xeon, including all Celerons. The AMD |
| Athlon, Opteron, Phenom, and Turion CPUs are also supported. Other IA32 |
| CPU types only support the RTC mode of OProfile; please |
| see later in this manual for details. Hyper-threaded Pentium IVs |
| are not supported in 2.4. For 2.4 kernels, the Intel |
| IA-64 CPUs are also supported. For 2.6 kernels, there is additionally |
| support for Alpha processors, MIPS, ARM, x86-64, sparc64, ppc64, AVR32, and, |
| in timer mode, PA-RISC and s390. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Uniprocessor or SMP</span> |
| </dt> |
| <dd> |
| <p> |
| SMP machines are fully supported. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Required libraries</span> |
| </dt> |
| <dd> |
| <p> |
| These libraries are required : <code class="filename">popt</code>, <code class="filename">bfd</code>, |
| <code class="filename">liberty</code> (debian users: libiberty is provided in binutils-dev package), <code class="filename">dl</code>, |
| plus the standard C++ libraries. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Required user account</span> |
| </dt> |
| <dd> |
| <p> |
| For secure processing of sample data from JIT virtual machines (e.g., Java), |
| the special user account "oprofile" must exist on the system. The 'configure' |
| and 'make install' operations will print warning messages if this |
| account is not found. If you intend to profile JITed code, you must create |
| a group account named 'oprofile' and then create the 'oprofile' user account, |
| setting the default group to 'oprofile'. A runtime error message is printed to |
| the oprofile daemon log when processing JIT samples if this special user |
| account cannot be found. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">OProfile GUI</span> |
| </dt> |
| <dd> |
| <p> |
| The use of the GUI to start the profiler requires the <code class="filename">Qt</code> library. |
| Either <code class="filename">Qt 3</code> or <code class="filename">Qt 4</code> should work. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <acronym class="acronym">ELF</acronym> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Probably not too strenuous a requirement, but older <acronym class="acronym">A.OUT</acronym> binaries/libraries are not supported. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">K&R coding style</span> |
| </dt> |
| <dd> |
| <p> |
| OK, so it's not really a requirement, but I wish it was... |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect1" title="3. Internet resources"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="resources"></a>3. Internet resources</h2> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term">Web page</span> |
| </dt> |
| <dd> |
| <p> |
| There is a web page (which you may be reading now) at |
| <a class="ulink" href="http://oprofile.sf.net/">http://oprofile.sf.net/</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Download</span> |
| </dt> |
| <dd> |
| <p> |
| You can download a source tarball or check out code from |
| the code repository at the sourceforge page, |
| <a class="ulink" href="http://sf.net/projects/oprofile/">http://sf.net/projects/oprofile/</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Mailing list</span> |
| </dt> |
| <dd> |
| <p> |
| There is a low-traffic OProfile-specific mailing list, details at |
| <a class="ulink" href="http://sf.net/mail/?group_id=16191">http://sf.net/mail/?group_id=16191</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Bug tracker</span> |
| </dt> |
| <dd> |
| <p> |
| There is a bug tracker for OProfile at SourceForge, |
| <a class="ulink" href="http://sf.net/tracker/?group_id=16191&atid=116191">http://sf.net/tracker/?group_id=16191&atid=116191</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">IRC channel</span> |
| </dt> |
| <dd> |
| <p> |
| Several OProfile developers and users sometimes hang out on channel <span class="command"><strong>#oprofile</strong></span> |
| on the <a class="ulink" href="http://oftc.net">OFTC</a> network. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect1" title="4. Installation"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="install"></a>4. Installation</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| First you need to build OProfile and install it. <span class="command"><strong>./configure</strong></span>, <span class="command"><strong>make</strong></span>, <span class="command"><strong>make install</strong></span> |
| is often all you need, but note these arguments to <span class="command"><strong>./configure</strong></span> : |
| </p> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">--with-linux</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Use this option to specify the location of the kernel source tree you wish |
| to compile against. The kernel module is built against this source and |
| will only work with a running kernel built from the same source with |
| exact same options, so it is important you specify this option if you need |
| to. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--with-java</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Use this option if you need to profile Java applications. Also, see |
| <a class="xref" href="#requirements" title="2. System requirements">Section 2, “System requirements”</a>, "Required user account". This option |
| is used to specify the location of the Java Development Kit (JDK) |
| source tree you wish to use. This is necessary to get the interface description |
| of the JVMPI (or JVMTI) interface to compile the JIT support code successfully. |
| </p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| The Java Runtime Environment (JRE) does not include the development |
| files that are required to compile the JIT support code, so the full |
| JDK must be installed in order to use this option. |
| </p> |
| </div> |
| <p> |
| By default, the Oprofile JIT support libraries will be installed in |
| <code class="filename"><oprof_install_dir>/lib/oprofile</code>. To build |
| and install OProfile and the JIT support libraries as 64-bit, you can |
| do something like the following: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # CFLAGS="-m64" CXXFLAGS="-m64" ./configure \ |
| --with-kernel-support --with-java={my_jdk_installdir} \ |
| --libdir=/usr/local/lib64 |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| </p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| If you encounter errors building 64-bit, you should |
| install libtool 1.5.26 or later since that release of |
| libtool fixes known problems for certain platforms. |
| If you install libtool into a non-standard location, |
| you'll need to edit the invocation of 'aclocal' in |
| OProfile's autogen.sh as follows (assume an install |
| location of /usr/local): |
| </p> |
| <p> |
| <code class="code">aclocal -I m4 -I /usr/local/share/aclocal</code> |
| </p> |
| </div> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--with-kernel-support</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Use this option with 2.6 and above kernels to indicate the |
| kernel provides the OProfile device driver. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--with-qt-dir/includes/libraries</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Specify the location of Qt headers and libraries. It defaults to searching in |
| <code class="constant">$QTDIR</code> if these are not specified. |
| </p> |
| </dd> |
| <dt> |
| <a id="disable-werror"></a> |
| <span class="term"> |
| <code class="option">--disable-werror</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Development versions of OProfile build by |
| default with <code class="option">-Werror</code>. This option turns |
| <code class="option">-Werror</code> off. |
| </p> |
| </dd> |
| <dt> |
| <a id="disable-optimization"></a> |
| <span class="term"> |
| <code class="option">--disable-optimization</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Disable the <code class="option">-O2</code> compiler flag |
| (useful if you discover an OProfile bug and want to give a useful |
| back-trace etc.) |
| </p> |
| </dd> |
| </dl> |
| </div> |
| <p> |
| You'll need to have a configured kernel source for the current kernel |
| to build the module for 2.4 kernels. Since all distributions provide different kernels it's unlikely the running kernel match the configured source |
| you installed. The safest way is to recompile your own kernel, run it and compile oprofile. It is also recommended that if you have a |
| uniprocessor machine, you enable the local APIC / IO_APIC support for |
| your kernel (this is automatically enabled for SMP kernels). With many BIOS, kernel >= 2.6.9 and UP kernel it's not sufficient to enable the local APIC you must also turn it on explicitly at boot time by providing "lapic" option to the kernel. On |
| machines with power management, such as laptops, the power management |
| must be turned off when using OProfile with 2.4 kernels. The power management software |
| in the BIOS cannot handle the non-maskable interrupts (NMIs) used by |
| OProfile for data collection. If you use the NMI watchdog, be aware that |
| the watchdog is disabled when profiling starts, and not re-enabled until the |
| OProfile module is removed (or, in 2.6, when OProfile is not running). If you compile OProfile for |
| a 2.2 kernel you must be root to compile the module. If you are using |
| 2.6 kernels or higher, you do not need kernel source, as long as the |
| OProfile driver is enabled; additionally, you should not need to disable |
| power management. |
| </p> |
| <p> |
| Please note that you must save or have available the <code class="filename">vmlinux</code> file |
| generated during a kernel compile, as OProfile needs it (you can use |
| <code class="option">--no-vmlinux</code>, but this will prevent kernel profiling). |
| </p> |
| </div> |
| <div class="sect1" title="5. Uninstalling OProfile"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="uninstall"></a>5. Uninstalling OProfile</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| You must have the source tree available to uninstall OProfile; a <span class="command"><strong>make uninstall</strong></span> will |
| remove all installed files except your configuration file in the directory <code class="filename">~/.oprofile</code>. |
| </p> |
| </div> |
| </div> |
| <div class="chapter" title="Chapter 2. Overview"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title"><a id="overview"></a>Chapter 2. Overview</h2> |
| </div> |
| </div> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#getting-started">1. Getting started</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#tools-overview">2. Tools summary</a> |
| </span> |
| </dt> |
| </dl> |
| </div> |
| <div class="sect1" title="1. Getting started"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="getting-started"></a>1. Getting started</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| Before you can use OProfile, you must set it up. The minimum setup required for this |
| is to tell OProfile where the <code class="filename">vmlinux</code> file corresponding to the |
| running kernel is, for example : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --vmlinux=/boot/vmlinux-`uname -r`</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| If you don't want to profile the kernel itself, |
| you can tell OProfile you don't have a <code class="filename">vmlinux</code> file : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --no-vmlinux</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Now we are ready to start the daemon (<span class="command"><strong>oprofiled</strong></span>) which collects |
| the profile data : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --start</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| When I want to stop profiling, I can do so with : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --shutdown</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Note that unlike <span class="command"><strong>gprof</strong></span>, no instrumentation (<code class="option">-pg</code> |
| and <code class="option">-a</code> options to <span class="command"><strong>gcc</strong></span>) |
| is necessary. |
| </p> |
| <p> |
| Periodically (or on <span class="command"><strong>opcontrol --shutdown</strong></span> or <span class="command"><strong>opcontrol --dump</strong></span>) |
| the profile data is written out into the $SESSION_DIR/samples directory (by default at <code class="filename">/var/lib/oprofile/samples</code>). |
| These profile files cover shared libraries, applications, the kernel (vmlinux), and kernel modules. |
| You can clear the profile data (at any time) with <span class="command"><strong>opcontrol --reset</strong></span>. |
| </p> |
| <p> |
| To place these sample database files in a specific directory instead of the default location (<code class="filename">/var/lib/oprofile</code>) use the <code class="option">--session-dir=dir</code> option. You must also specify the <code class="option">--session-dir</code> to tell the tools to continue using this directory. (In the future, we should allow this to be specified in an environment variable.) : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --no-vmlinux --session-dir=/home/me/tmpsession</pre> |
| </td> |
| </tr> |
| </table> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --start --session-dir=/home/me/tmpsession</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| You can get summaries of this data in a number of ways at any time. To get a summary of |
| data across the entire system for all of these profiles, you can do : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opreport [--session-dir=dir]</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Or to get a more detailed summary, for a particular image, you can do something like : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opreport -l /boot/vmlinux-`uname -r`</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| There are also a number of other ways of presenting the data, as described later in this manual. |
| Note that OProfile will choose a default profiling setup for you. However, there are a number |
| of options you can pass to <span class="command"><strong>opcontrol</strong></span> if you need to change something, |
| also detailed later. |
| </p> |
| </div> |
| <div class="sect1" title="2. Tools summary"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="tools-overview"></a>2. Tools summary</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| This section gives a brief description of the available OProfile utilities and their purpose. |
| </p> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="filename">ophelp</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This utility lists the available events and short descriptions. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">opcontrol</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Used for controlling the OProfile data collection, discussed in <a class="xref" href="#controlling" title="Chapter 3. Controlling the profiler">Chapter 3, <i>Controlling the profiler</i></a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">agent libraries</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Used by virtual machines (like the Java VM) to record information about JITed code being profiled. See <a class="xref" href="#setup-jit" title="2. Setting up the JIT profiling feature">Section 2, “Setting up the JIT profiling feature”</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">opreport</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This is the main tool for retrieving useful profile data, described in |
| <a class="xref" href="#opreport" title="2. Image summaries and symbol summaries (opreport)">Section 2, “Image summaries and symbol summaries (<span class="command"><strong>opreport</strong></span>)”</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">opannotate</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This utility can be used to produce annotated source, assembly or mixed source/assembly. |
| Source level annotation is available only if the application was compiled with |
| debugging symbols. See <a class="xref" href="#opannotate" title="3. Outputting annotated source (opannotate)">Section 3, “Outputting annotated source (<span class="command"><strong>opannotate</strong></span>)”</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">opgprof</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This utility can output gprof-style data files for a binary, for use with |
| <span class="command"><strong>gprof -p</strong></span>. See <a class="xref" href="#opgprof" title="5. gprof-compatible output (opgprof)">Section 5, “<span class="command"><strong>gprof</strong></span>-compatible output (<span class="command"><strong>opgprof</strong></span>)”</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">oparchive</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This utility can be used to collect executables, debuginfo, |
| and sample files and copy the files into an archive. |
| The archive is self-contained and can be moved to another |
| machine for further analysis. |
| See <a class="xref" href="#oparchive" title="6. Archiving measurements (oparchive)">Section 6, “Archiving measurements (<span class="command"><strong>oparchive</strong></span>)”</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">opimport</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This utility converts sample database files from a foreign binary format (abi) to |
| the native format. This is useful only when moving sample files between hosts, |
| for analysis on platforms other than the one used for collection. |
| See <a class="xref" href="#opimport" title="7. Converting sample database files (opimport)">Section 7, “Converting sample database files (<span class="command"><strong>opimport</strong></span>)”</a>. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div class="chapter" title="Chapter 3. Controlling the profiler"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title"><a id="controlling"></a>Chapter 3. Controlling the profiler</h2> |
| </div> |
| </div> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#controlling-daemon">1. Using <span class="command"><strong>opcontrol</strong></span></a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opcontrolexamples">1.1. Examples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#eventspec">1.2. Specifying performance counter events</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#setup-jit">2. Setting up the JIT profiling feature</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#setup-jit-jvm">2.1. JVM instrumentation</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#oprofile-gui">3. Using <span class="command"><strong>oprof_start</strong></span></a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#detailed-parameters">4. Configuration details</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#hardware-counters">4.1. Hardware performance counters</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#rtc">4.2. OProfile in RTC mode</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#timer">4.3. OProfile in timer interrupt mode</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#p4">4.4. Pentium 4 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#ia64">4.5. Intel Itanium 2 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#ppc64">4.6. PowerPC64 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#cell-be">4.7. Cell Broadband Engine support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#amd-ibs-support">4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#misuse">4.9. Dangerous counter settings</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| </dl> |
| </div> |
| <div class="sect1" title="1. Using opcontrol"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="controlling-daemon"></a>1. Using <span class="command"><strong>opcontrol</strong></span></h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| In this section we describe the configuration and control of the profiling system |
| with opcontrol in more depth. |
| The <span class="command"><strong>opcontrol</strong></span> script has a default setup, but you |
| can alter this with the options given below. In particular, |
| if your hardware supports performance counters, you can configure them. |
| There are a number of counters (for example, counter 0 and counter 1 |
| on the Pentium III). Each of these counters can be programmed with |
| an event to count, such as cache misses or MMX operations. The event |
| chosen for each counter is reflected in the profile data collected |
| by OProfile: functions and binaries at the top of the profiles reflect |
| that most of the chosen events happened within that code. |
| </p> |
| <p> |
| Additionally, each counter has a "count" value: this corresponds to how |
| detailed the profile is. The lower the value, the more frequently profile |
| samples are taken. A counter can choose to sample only kernel code, user-space code, |
| or both (both is the default). Finally, some events have a "unit mask" |
| - this is a value that further restricts the types of event that are counted. |
| The event types and unit masks for your CPU are listed by <span class="command"><strong>opcontrol |
| --list-events</strong></span>. |
| </p> |
| <p> |
| The <span class="command"><strong>opcontrol</strong></span> script provides the following actions : |
| </p> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">--init</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Loads the OProfile module if required and makes the OProfile driver |
| interface available. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--setup</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Followed by list arguments for profiling set up. List of arguments |
| saved in <code class="filename">/root/.oprofile/daemonrc</code>. |
| Giving this option is not necessary; you can just directly pass one |
| of the setup options, e.g. <span class="command"><strong>opcontrol --no-vmlinux</strong></span>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--status</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show configuration information. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--start-daemon</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Start the oprofile daemon without starting actual profiling. The profiling |
| can then be started using <code class="option">--start</code>. This is useful for avoiding |
| measuring the cost of daemon startup, as <code class="option">--start</code> is a simple |
| write to a file in oprofilefs. Not available in 2.2/2.4 kernels. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--start</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Start data collection with either arguments provided by <code class="option">--setup</code> |
| or information saved in <code class="filename">/root/.oprofile/daemonrc</code>. Specifying |
| the addition <code class="option">--verbose</code> makes the daemon generate lots of debug data |
| whilst it is running. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--dump</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Force a flush of the collected profiling data to the daemon. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--stop</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Stop data collection (this separate step is not possible with 2.2 or 2.4 kernels). |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--shutdown</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Stop data collection and kill the daemon. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--reset</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Clears out data from current session, but leaves saved sessions. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--save=</code>session_name</span> |
| </dt> |
| <dd> |
| <p> |
| Save data from current session to session_name. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--deinit</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Shuts down daemon. Unload the OProfile module and oprofilefs. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--list-events</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| List event types and unit masks. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--help</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Generate usage messages. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| <p> |
| There are a number of possible settings, of which, only |
| <code class="option">--vmlinux</code> (or <code class="option">--no-vmlinux</code>) |
| is required. These settings are stored in <code class="filename">~/.oprofile/daemonrc</code>. |
| </p> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"><code class="option">--buffer-size=</code>num</span> |
| </dt> |
| <dd> |
| <p> |
| Number of samples in kernel buffer. When using a 2.6 kernel |
| buffer watershed need to be tweaked when changing this value. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--buffer-watershed=</code>num</span> |
| </dt> |
| <dd> |
| <p> |
| Set kernel buffer watershed to num samples (2.6 only). When it'll remain only |
| buffer-size - buffer-watershed free entry in the kernel buffer data will be |
| flushed to daemon, most usefull value are in the range [0.25 - 0.5] * buffer-size. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--cpu-buffer-size=</code>num</span> |
| </dt> |
| <dd> |
| <p> |
| Number of samples in kernel per-cpu buffer (2.6 only). If you |
| profile at high rate it can help to increase this if the log |
| file show excessive count of sample lost cpu buffer overflow. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--event=</code>[eventspec]</span> |
| </dt> |
| <dd> |
| <p> |
| Use the given performance counter event to profile. |
| See <a class="xref" href="#eventspec" title="1.2. Specifying performance counter events">Section 1.2, “Specifying performance counter events”</a> below. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--session-dir=</code>dir_path</span> |
| </dt> |
| <dd> |
| <p> |
| Create/use sample database out of directory <code class="filename">dir_path</code> instead of |
| the default location (/var/lib/oprofile). |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--separate=</code>[none,lib,kernel,thread,cpu,all]</span> |
| </dt> |
| <dd> |
| <p> |
| By default, every profile is stored in a single file. Thus, for example, |
| samples in the C library are all accredited to the <code class="filename">/lib/libc.o</code> |
| profile. However, you choose to create separate sample files by specifying |
| one of the below options. |
| </p> |
| <div class="informaltable"> |
| <table border="1"> |
| <colgroup> |
| <col /> |
| <col /> |
| </colgroup> |
| <tbody> |
| <tr> |
| <td> |
| <code class="option">none</code> |
| </td> |
| <td>No profile separation (default)</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">lib</code> |
| </td> |
| <td>Create per-application profiles for libraries</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">kernel</code> |
| </td> |
| <td>Create per-application profiles for the kernel and kernel modules</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">thread</code> |
| </td> |
| <td>Create profiles for each thread and each task</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">cpu</code> |
| </td> |
| <td>Create profiles for each CPU</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">all</code> |
| </td> |
| <td>All of the above options</td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <p> |
| Note that <code class="option">--separate=kernel</code> also turns on <code class="option">--separate=lib</code>. |
| |
| When using <code class="option">--separate=kernel</code>, samples in hardware interrupts, soft-irqs, or other |
| asynchronous kernel contexts are credited to the task currently running. This means you will see |
| seemingly nonsense profiles such as <code class="filename">/bin/bash</code> showing samples for the PPP modules, |
| etc. |
| </p> |
| <p> |
| On 2.2/2.4 only kernel threads already started when profiling begins are correctly profiled; |
| newly started kernel thread samples are credited to the vmlinux (kernel) profile. |
| </p> |
| <p> |
| Using <code class="option">--separate=thread</code> creates a lot |
| of sample files if you leave OProfile running for a while; it's most |
| useful when used for short sessions, or when using image filtering. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--callgraph=</code>#depth</span> |
| </dt> |
| <dd> |
| <p> |
| Enable call-graph sample collection with a maximum depth. Use 0 to disable |
| callgraph profiling. NOTE: Callgraph support is available on a limited |
| number of platforms at this time; for example: |
| </p> |
| <p> |
| </p> |
| <div class="itemizedlist"> |
| <ul class="itemizedlist" type="disc"> |
| <li class="listitem"> |
| <p>x86 with recent 2.6 kernel</p> |
| </li> |
| <li class="listitem"> |
| <p>ARM with recent 2.6 kernel</p> |
| </li> |
| <li class="listitem"> |
| <p>PowerPC with 2.6.17 kernel</p> |
| </li> |
| </ul> |
| </div> |
| <p> |
| </p> |
| <p> |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--image=</code>image,[images]|"all"</span> |
| </dt> |
| <dd> |
| <p> |
| Image filtering. If you specify one or more absolute |
| paths to binaries, OProfile will only produce profile results for those |
| binary images. This is useful for restricting the sometimes voluminous |
| output you may get otherwise, especially with |
| <code class="option">--separate=thread</code>. Note that if you are using |
| <code class="option">--separate=lib</code> or |
| <code class="option">--separate=kernel</code>, then if you specification an |
| application binary, the shared libraries and kernel code |
| <span class="emphasis"><em>are</em></span> included. Specify the value |
| "all" to profile everything (the default). |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--vmlinux=</code>file</span> |
| </dt> |
| <dd> |
| <p> |
| vmlinux kernel image. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--no-vmlinux</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Use this when you don't have a kernel vmlinux file, and you don't want |
| to profile the kernel. This still counts the total number of kernel samples, |
| but can't give symbol-based results for the kernel or any modules. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| <div class="sect2" title="1.1. Examples"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opcontrolexamples"></a>1.1. Examples</h3> |
| </div> |
| </div> |
| </div> |
| <div class="sect3" title="1.1.1. Intel performance counter setup"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="examplesperfctr"></a>1.1.1. Intel performance counter setup</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| Here, we have a Pentium III running at 800MHz, and we want to look at where data memory |
| references are happening most, and also get results for CPU time. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opcontrol --event=CPU_CLK_UNHALTED:400000 --event=DATA_MEM_REFS:10000 |
| # opcontrol --vmlinux=/boot/2.6.0/vmlinux |
| # opcontrol --start |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect3" title="1.1.2. RTC mode"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="examplesrtc"></a>1.1.2. RTC mode</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| Here, we have an Intel laptop without support for performance counters, running on 2.4 kernels. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # ophelp -r |
| CPU with RTC device |
| # opcontrol --vmlinux=/boot/2.4.13/vmlinux --event=RTC_INTERRUPTS:1024 |
| # opcontrol --start |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect3" title="1.1.3. Starting the daemon separately"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="examplesstartdaemon"></a>1.1.3. Starting the daemon separately</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| If we're running 2.6 kernels, we can use <code class="option">--start-daemon</code> to avoid |
| the profiler startup affecting results. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opcontrol --vmlinux=/boot/2.6.0/vmlinux |
| # opcontrol --start-daemon |
| # my_favourite_benchmark --init |
| # opcontrol --start ; my_favourite_benchmark --run ; opcontrol --stop |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect3" title="1.1.4. Separate profiles for libraries and the kernel"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="exampleseparate"></a>1.1.4. Separate profiles for libraries and the kernel</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| Here, we want to see a profile of the OProfile daemon itself, including when |
| it was running inside the kernel driver, and its use of shared libraries. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opcontrol --separate=kernel --vmlinux=/boot/2.6.0/vmlinux |
| # opcontrol --start |
| # my_favourite_stress_test --run |
| # opreport -l -p /lib/modules/2.6.0/kernel /usr/local/bin/oprofiled |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect3" title="1.1.5. Profiling sessions"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="examplessessions"></a>1.1.5. Profiling sessions</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| It can often be useful to split up profiling data into several different |
| time periods. For example, you may want to collect data on an application's |
| startup separately from the normal runtime data. You can use the simple |
| command <span class="command"><strong>opcontrol --save</strong></span> to do this. For example : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opcontrol --save=blah |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| will create a sub-directory in <code class="filename">$SESSION_DIR/samples</code> containing the samples |
| up to that point (the current session's sample files are moved into this |
| directory). You can then pass this session name as a parameter to the post-profiling |
| analysis tools, to only get data up to the point you named the |
| session. If you do not want to save a session, you can do |
| <span class="command"><strong>rm -rf $SESSION_DIR/samples/sessionname</strong></span> or, for the |
| current session, <span class="command"><strong>opcontrol --reset</strong></span>. |
| </p> |
| </div> |
| </div> |
| <div class="sect2" title="1.2. Specifying performance counter events"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="eventspec"></a>1.2. Specifying performance counter events</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The <code class="option">--event</code> option to <span class="command"><strong>opcontrol</strong></span> |
| takes a specification that indicates how the details of each |
| hardware performance counter should be setup. If you want to |
| revert to OProfile's default setting (<code class="option">--event</code> |
| is strictly optional), use <code class="option">--event=default</code>. Use of this |
| option over-rides all previous event selections. |
| </p> |
| <p> |
| You can pass multiple event specifications. OProfile will allocate |
| hardware counters as necessary. Note that some combinations are not |
| allowed by the CPU; running <span class="command"><strong>opcontrol --list-events</strong></span> gives the details |
| of each event. The event specification is a colon-separated string |
| of the form <code class="option"><span class="emphasis"><em>name</em></span>:<span class="emphasis"><em>count</em></span>:<span class="emphasis"><em>unitmask</em></span>:<span class="emphasis"><em>kernel</em></span>:<span class="emphasis"><em>user</em></span></code> as described in this table: |
| </p> |
| <div class="informaltable"> |
| <table border="1"> |
| <colgroup> |
| <col /> |
| <col /> |
| </colgroup> |
| <tbody> |
| <tr> |
| <td> |
| <code class="option">name</code> |
| </td> |
| <td>The symbolic event name, e.g. <code class="constant">CPU_CLK_UNHALTED</code></td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">count</code> |
| </td> |
| <td>The counter reset value, e.g. 100000</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">unitmask</code> |
| </td> |
| <td>The unit mask, as given in the events list: e.g. 0x0f; or a symbolic name as |
| given by the first word of the description (only valid for unit masks having an "extra:" parameter)</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">kernel</code> |
| </td> |
| <td>Whether to profile kernel code</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">user</code> |
| </td> |
| <td>Whether to profile userspace code</td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <p> |
| The last three values are optional, if you omit them (e.g. <code class="option">--event=DATA_MEM_REFS:30000</code>), |
| they will be set to the default values (a unit mask of 0, and profiling both kernel and |
| userspace code). Note that some events require a unit mask. |
| </p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| For the PowerPC platforms, all events specified must be in the same group; i.e., the group number |
| appended to the event name (e.g. <code class="constant"><<span class="emphasis"><em>some-event-name</em></span>>_GRP9</code>) must be the same. |
| </p> |
| </div> |
| <p> |
| If OProfile is using RTC mode, and you want to alter the default counter value, |
| you can use something like <code class="option">--event=RTC_INTERRUPTS:2048</code>. Note the last |
| three values here are ignored. |
| If OProfile is using timer-interrupt mode, there is no configuration possible. |
| </p> |
| <p> |
| The table below lists the events selected by default |
| (<code class="option">--event=default</code>) for the various computer architectures: |
| </p> |
| <div class="informaltable"> |
| <table border="1"> |
| <colgroup> |
| <col /> |
| <col /> |
| <col /> |
| </colgroup> |
| <tbody> |
| <tr> |
| <td>Processor</td> |
| <td>cpu_type</td> |
| <td>Default event</td> |
| </tr> |
| <tr> |
| <td>Alpha EV4</td> |
| <td>alpha/ev4</td> |
| <td>CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Alpha EV5</td> |
| <td>alpha/ev5</td> |
| <td>CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Alpha PCA56</td> |
| <td>alpha/pca56</td> |
| <td>CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Alpha EV6</td> |
| <td>alpha/ev6</td> |
| <td>CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Alpha EV67</td> |
| <td>alpha/ev67</td> |
| <td>CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>ARM/XScale PMU1</td> |
| <td>arm/xscale1</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>ARM/XScale PMU2</td> |
| <td>arm/xscale2</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>ARM/MPCore</td> |
| <td>arm/mpcore</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>AVR32</td> |
| <td>avr32</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Athlon</td> |
| <td>i386/athlon</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium Pro</td> |
| <td>i386/ppro</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium II</td> |
| <td>i386/pii</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium III</td> |
| <td>i386/piii</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium M (P6 core)</td> |
| <td>i386/p6_mobile</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium 4 (non-HT)</td> |
| <td>i386/p4</td> |
| <td>GLOBAL_POWER_EVENTS:100000:1:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium 4 (HT)</td> |
| <td>i386/p4-ht</td> |
| <td>GLOBAL_POWER_EVENTS:100000:1:1:1</td> |
| </tr> |
| <tr> |
| <td>Hammer</td> |
| <td>x86-64/hammer</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Family10h</td> |
| <td>x86-64/family10</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Family11h</td> |
| <td>x86-64/family11h</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Itanium</td> |
| <td>ia64/itanium</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Itanium 2</td> |
| <td>ia64/itanium2</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>TIMER_INT</td> |
| <td>timer</td> |
| <td>None selectable</td> |
| </tr> |
| <tr> |
| <td>IBM iseries</td> |
| <td>PowerPC 4/5/970</td> |
| <td>CYCLES:10000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>IBM pseries</td> |
| <td>PowerPC 4/5/970/Cell</td> |
| <td>CYCLES:10000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>IBM s390</td> |
| <td>timer</td> |
| <td>None selectable</td> |
| </tr> |
| <tr> |
| <td>IBM s390x</td> |
| <td>timer</td> |
| <td>None selectable</td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| </div> |
| </div> |
| <div class="sect1" title="2. Setting up the JIT profiling feature"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="setup-jit"></a>2. Setting up the JIT profiling feature</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| To gather information about JITed code from a virtual machine, |
| it needs to be instrumented with an agent library. We use the |
| agent libraries for Java in the following example. To use the |
| Java profiling feature, you must build OProfile with the "--with-java" option |
| (<a class="xref" href="#install" title="4. Installation">Section 4, “Installation”</a>). |
| |
| </p> |
| <div class="sect2" title="2.1. JVM instrumentation"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="setup-jit-jvm"></a>2.1. JVM instrumentation</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Add this to the startup parameters of the JVM (for JVMTI): |
| |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-agentpath:<libdir>/libjvmti_oprofile.so[=<options>]</code> </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| or |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-agentlib:jvmti_oprofile[=<options>]</code> </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| </p> |
| <p> |
| The JVMPI agent implementation is enabled with the command line option |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-Xrunjvmpi_oprofile[:<options>]</code> </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| </p> |
| <p> |
| Currently, there is just one option available -- <code class="option">debug</code>. For JVMPI, |
| the convention for specifying an option is <code class="option">option_name=[yes|no]</code>. |
| For JVMTI, the option specification is simply the option name, implying |
| "yes"; no option specified implies "no". |
| </p> |
| <p> |
| The agent library (installed in <code class="filename"><oprof_install_dir>/lib/oprofile</code>) |
| needs to be in the library search path (e.g. add the library directory |
| to <code class="constant">LD_LIBRARY_PATH</code>). If the command line of |
| the JVM is not accessible, it may be buried within shell scripts or a |
| launcher program. It may also be possible to set an environment variable to add |
| the instrumentation. |
| For Sun JVMs this is <code class="constant">JAVA_TOOL_OPTIONS</code>. Please check |
| your JVM documentation for |
| further information on the agent startup options. |
| </p> |
| </div> |
| </div> |
| <div class="sect1" title="3. Using oprof_start"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="oprofile-gui"></a>3. Using <span class="command"><strong>oprof_start</strong></span></h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| The <span class="command"><strong>oprof_start</strong></span> application provides a convenient way to start the profiler. |
| Note that <span class="command"><strong>oprof_start</strong></span> is just a wrapper around the <span class="command"><strong>opcontrol</strong></span> script, |
| so it does not provide more services than the script itself. |
| </p> |
| <p> |
| After <span class="command"><strong>oprof_start</strong></span> is started you can select the event type for each counter; |
| the sampling rate and other related parameters are explained in <a class="xref" href="#controlling-daemon" title="1. Using opcontrol">Section 1, “Using <span class="command"><strong>opcontrol</strong></span>”</a>. |
| The "Configuration" section allows you to set general parameters such as the buffer size, kernel filename |
| etc. The counter setup interface should be self-explanatory; <a class="xref" href="#hardware-counters" title="4.1. Hardware performance counters">Section 4.1, “Hardware performance counters”</a> and related |
| links contain information on using unit masks. |
| </p> |
| <p> |
| A status line shows the current status of the profiler: how long it has been running, and the average |
| number of interrupts received per second and the total, over all processors. |
| Note that quitting <span class="command"><strong>oprof_start</strong></span> does not stop the profiler. |
| </p> |
| <p> |
| Your configuration is saved in the same file as <span class="command"><strong>opcontrol</strong></span> uses; that is, |
| <code class="filename">~/.oprofile/daemonrc</code>. |
| </p> |
| </div> |
| <div class="sect1" title="4. Configuration details"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="detailed-parameters"></a>4. Configuration details</h2> |
| </div> |
| </div> |
| </div> |
| <div class="sect2" title="4.1. Hardware performance counters"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="hardware-counters"></a>4.1. Hardware performance counters</h3> |
| </div> |
| </div> |
| </div> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| Your CPU type may not include the requisite support for hardware performance counters, in which case |
| you must use OProfile in RTC mode in 2.4 (see <a class="xref" href="#rtc" title="4.2. OProfile in RTC mode">Section 4.2, “OProfile in RTC mode”</a>), or timer mode in 2.6 (see <a class="xref" href="#timer" title="4.3. OProfile in timer interrupt mode">Section 4.3, “OProfile in timer interrupt mode”</a>). |
| You do not really need to read this section unless you are interested in using |
| events other than the default event chosen by OProfile. |
| </p> |
| </div> |
| <p> |
| The Intel hardware performance counters are detailed in the Intel IA-32 Architecture Manual, Volume 3, available |
| from <a class="ulink" href="http://developer.intel.com/">http://developer.intel.com/</a>. |
| The AMD Athlon/Opteron/Phenom/Turion implementation is detailed in <a class="ulink" href="http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf"> |
| http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf</a>. |
| For PowerPC64 processors in IBM iSeries, pSeries, and blade server systems, processor documentation |
| is available at <a class="ulink" href="http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC/"> |
| http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC</a>. (For example, the |
| specific publication containing information on the performance monitor unit for the PowerPC970 is |
| "IBM PowerPC 970FX RISC Microprocessor User's Manual.") |
| These processors are capable of delivering an interrupt when a counter overflows. |
| This is the basic mechanism on which OProfile is based. The delivery mode is <acronym class="acronym">NMI</acronym>, |
| so blocking interrupts in the kernel does not prevent profiling. When the interrupt handler is called, |
| the current <acronym class="acronym">PC</acronym> value and the current task are recorded into the profiling structure. |
| This allows the overflow event to be attached to a specific assembly instruction in a binary image. |
| The daemon receives this data from the kernel, and writes it to the sample files. |
| </p> |
| <p> |
| If we use an event such as <code class="constant">CPU_CLK_UNHALTED</code> or <code class="constant">INST_RETIRED</code> |
| (<code class="constant">GLOBAL_POWER_EVENTS</code> or <code class="constant">INSTR_RETIRED</code>, respectively, on the Pentium 4), we can |
| use the overflow counts as an estimate of actual time spent in each part of code. Alternatively we can profile interesting |
| data such as the cache behaviour of routines with the other available counters. |
| </p> |
| <p> |
| However there are several caveats. First, there are those issues listed in the Intel manual. There is a delay |
| between the counter overflow and the interrupt delivery that can skew results on a small scale - this means |
| you cannot rely on the profiles at the instruction level as being perfectly accurate. |
| If you are using an "event-mode" counter such as the cache counters, a count registered against it doesn't mean |
| that it is responsible for that event. However, it implies that the counter overflowed in the dynamic |
| vicinity of that instruction, to within a few instructions. Further details on this problem can be found in |
| <a class="xref" href="#interpreting" title="Chapter 5. Interpreting profiling results">Chapter 5, <i>Interpreting profiling results</i></a> and also in the Digital paper "ProfileMe: A Hardware Performance Counter". |
| </p> |
| <p> |
| Each counter has several configuration parameters. |
| First, there is the unit mask: this simply further specifies what to count. |
| Second, there is the counter value, discussed below. Third, there is a parameter whether to increment counts |
| whilst in kernel or user space. You can configure these separately for each counter. |
| </p> |
| <p> |
| After each overflow event, the counter will be re-initialized |
| such that another overflow will occur after this many events have been counted. Thus, higher |
| values mean less-detailed profiling, and lower values mean more detail, but higher overhead. |
| Picking a good value for this |
| parameter is, unfortunately, somewhat of a black art. It is of course dependent on the event |
| you have chosen. |
| Specifying too large a value will mean not enough interrupts are generated |
| to give a realistic profile (though this problem can be ameliorated by profiling for <span class="emphasis"><em>longer</em></span>). |
| Specifying too small a value can lead to higher performance overhead. |
| </p> |
| </div> |
| <div class="sect2" title="4.2. OProfile in RTC mode"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="rtc"></a>4.2. OProfile in RTC mode</h3> |
| </div> |
| </div> |
| </div> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| This section applies to 2.2/2.4 kernels only. |
| </p> |
| </div> |
| <p> |
| Some CPU types do not provide the needed hardware support to use the hardware performance counters. This includes |
| some laptops, classic Pentiums, and other CPU types not yet supported by OProfile (such as Cyrix). |
| On these machines, OProfile falls |
| back to using the real-time clock interrupt to collect samples. This interrupt is also used by the <span class="command"><strong>rtc</strong></span> |
| module: you cannot have both the OProfile and rtc modules loaded nor the rtc support compiled in the kernel. |
| </p> |
| <p> |
| RTC mode is less capable than the hardware counters mode; in particular, it is unable to profile sections of |
| the kernel where interrupts are disabled. There is just one available event, "RTC interrupts", and its value |
| corresponds to the number of interrupts generated per second (that is, a higher number means a better profiling |
| resolution, and higher overhead). The current implementation of the real-time clock supports only power-of-two |
| sampling rates from 2 to 4096 per second. Other values within this range are rounded to the nearest power of |
| two. |
| </p> |
| <p> |
| You can force use of the RTC interrupt with the <code class="option">force_rtc=1</code> module parameter. |
| </p> |
| <p> |
| Setting the value from the GUI should be straightforward. On the command line, you need to specify the |
| event to <span class="command"><strong>opcontrol</strong></span>, e.g. : |
| </p> |
| <p> |
| <span class="command"> |
| <strong>opcontrol --event=RTC_INTERRUPTS:256</strong> |
| </span> |
| </p> |
| </div> |
| <div class="sect2" title="4.3. OProfile in timer interrupt mode"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="timer"></a>4.3. OProfile in timer interrupt mode</h3> |
| </div> |
| </div> |
| </div> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| This section applies to 2.6 kernels and above only. |
| </p> |
| </div> |
| <p> |
| In 2.6 kernels on CPUs without OProfile support for the hardware performance counters, the driver |
| falls back to using the timer interrupt for profiling. Like the RTC mode in 2.4 kernels, this is not able to |
| profile code that has interrupts disabled. Note that there are no configuration parameters for |
| setting this, unlike the RTC and hardware performance counter setup. |
| </p> |
| <p> |
| You can force use of the timer interrupt by using the <code class="option">timer=1</code> module |
| parameter (or <code class="option">oprofile.timer=1</code> on the boot command line if OProfile is |
| built-in). |
| </p> |
| </div> |
| <div class="sect2" title="4.4. Pentium 4 support"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="p4"></a>4.4. Pentium 4 support</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The Pentium 4 / Xeon performance counters are organized around 3 types of model specific registers (MSRs): 45 event |
| selection control registers (ESCRs), 18 counter configuration control registers (CCCRs) and 18 counters. ESCRs describe a |
| particular set of events which are to be recorded, and CCCRs bind ESCRs to counters and configure their |
| operation. Unfortunately the relationship between these registers is quite complex; they cannot all be used with one |
| another at any time. There is, however, a subset of 8 counters, 8 ESCRs, and 8 CCCRs which can be used independently of |
| one another, so OProfile only accesses those registers, treating them as a bank of 8 "normal" counters, similar |
| to those in the P6 or Athlon/Opteron/Phenom/Turion families of CPU. |
| </p> |
| <p> |
| There is currently no support for Precision Event-Based Sampling (PEBS), nor any advanced uses of the Debug Store |
| (DS). Current support is limited to the conservative extension of OProfile's existing interrupt-based model described |
| above. Performance monitoring hardware on Pentium 4 / Xeon processors with Hyperthreading enabled (multiple logical |
| processors on a single die) is not supported in 2.4 kernels (you can use OProfile if you disable hyper-threading, |
| though). |
| </p> |
| </div> |
| <div class="sect2" title="4.5. Intel Itanium 2 support"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="ia64"></a>4.5. Intel Itanium 2 support</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The Itanium 2 performance monitoring unit (PMU) organizes the counters as four |
| pairs of performance event monitoring registers. Each pair is composed of a |
| Performance Monitoring Configuration (PMC) register and Performance Monitoring |
| Data (PMD) register. The PMC selects the performance event being monitored and |
| the PMD determines the sampling interval. The IA64 Performance Monitoring Unit |
| (PMU) triggers sampling with maskable interrupts. Thus, samples will not occur |
| in sections of the IA64 kernel where interrupts are disabled. |
| </p> |
| <p> |
| None of the advance features of the Itanium 2 performance monitoring unit |
| such as opcode matching, address range matching, or precise event sampling are |
| supported by this version of OProfile. The Itanium 2 support only maps OProfile's |
| existing interrupt-based model to the PMU hardware. |
| </p> |
| </div> |
| <div class="sect2" title="4.6. PowerPC64 support"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="ppc64"></a>4.6. PowerPC64 support</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The performance monitoring unit (PMU) for the IBM PowerPC 64-bit processors |
| consists of between 4 and 8 counters (depending on the model), plus three |
| special purpose registers used for programming the counters -- MMCR0, MMCR1, |
| and MMCRA. Advanced features such as instruction matching and thresholding are |
| not supported by this version of OProfile. |
| </p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>Later versions of the IBM POWER5+ processor (beginning with revision 3.0) |
| run the performance monitor unit in POWER6 mode, effectively removing OProfile's |
| access to counters 5 and 6. These two counters are dedicated to counting |
| instructions completed and cycles, respectively. In POWER6 mode, however, the |
| counters do not generate an interrupt on overflow and so are unusable by |
| OProfile. Kernel versions 2.6.23 and higher will recognize this mode |
| and export "ppc64/power5++" as the cpu_type to the oprofilefs pseudo filesystem. |
| OProfile userspace responds to this cpu_type by removing these counters from |
| the list of potential events to count. Without this kernel support, attempts |
| to profile using an event from one of these counters will yield incorrect |
| results -- typically, zero (or near zero) samples in the generated report. |
| </div> |
| <p> |
| </p> |
| </div> |
| <div class="sect2" title="4.7. Cell Broadband Engine support"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="cell-be"></a>4.7. Cell Broadband Engine support</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The Cell Broadband Engine (CBE) processor core consists of a PowerPC Processing |
| Element (PPE) and 8 Synergistic Processing Elements (SPE). PPEs and SPEs each |
| consist of a processing unit (PPU and SPU, respectively) and other hardware |
| components, such as memory controllers. |
| </p> |
| <p> |
| A PPU has two hardware threads (aka "virtual CPUs"). The performance monitor |
| unit of the CBE collects event information on one hardware thread at a time. |
| Therefore, when profiling PPE events, |
| OProfile collects the profile based on the selected events by time slicing the |
| performance counter hardware between the two threads. The user must ensure the |
| collection interval is long enough so that the time spent collecting data for |
| each PPU is sufficient to obtain a good profile. |
| </p> |
| <p> |
| To profile an SPU application, the user should specify the SPU_CYCLES event. |
| When starting OProfile with SPU_CYCLES, the opcontrol script enforces certain |
| separation parameters (separate=cpu,lib) to ensure that sufficient information |
| is collected in the sample data in order to generate a complete report. The |
| --merge=cpu option can be used to obtain a more readable report if analyzing |
| the performance of each separate SPU is not necessary. |
| </p> |
| <p> |
| Profiling with an SPU event (events 4100 through 4163) is not compatible with any other |
| event. Further more, only one SPU event can be specified at a time. The hardware only |
| supports profiling on one SPU per node at a time. The OProfile kernel code time slices |
| between the eight SPUs to collect data on all SPUs. |
| </p> |
| <p> |
| SPU profile reports have some unique characteristics compared to reports for |
| standard architectures: |
| </p> |
| <div class="itemizedlist"> |
| <ul class="itemizedlist" type="disc"> |
| <li class="listitem">Typically no "app name" column. This is really standard OProfile behavior |
| when the report contains samples for just a single application, which is |
| commonly the case when profiling SPUs.</li> |
| <li class="listitem">"CPU" equates to "SPU"</li> |
| <li class="listitem">Specifying '--long-filenames' on the opreport command does not always result |
| in long filenames. This happens when the SPU application code is embedded in |
| the PPE executable or shared library. The embedded SPU ELF data contains only the |
| short filename (i.e., no path information) for the SPU binary file that was used as |
| the source for embedding. The reason that just the short filename is used is because |
| the original SPU binary file may not exist or be accessible at runtime. The performance |
| analyst must have sufficient knowledge of the application to be able to correlate the |
| SPU binary image names found in the report to the application's source files. |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3> |
| Compile the application with -g and generate the OProfile report |
| with -g to facilitate finding the right source file(s) on which to focus. |
| </div></li> |
| </ul> |
| </div> |
| </div> |
| <div class="sect2" title="4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="amd-ibs-support"></a>4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Instruction-Based Sampling (IBS) is a new performance measurement technique |
| available on AMD Family 10h processors. Traditional performance counter |
| sampling is not precise enough to isolate performance issues to individual |
| instructions. IBS, however, precisely identifies instructions which are not |
| making the best use of the processor pipeline and memory hierarchy. |
| For more information, please refer to the "Instruction-Based Sampling: |
| A New Performance Analysis Technique for AMD Family 10h Processors" ( |
| <a class="ulink" href="http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf"> |
| http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf</a>). |
| There are two types of IBS profile types, described in the following sections. |
| </p> |
| <div class="sect3" title="4.8.1. IBS Fetch"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="ibs-fetch"></a>4.8.1. IBS Fetch</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| IBS fetch sampling is a statistical sampling method which counts completed |
| fetch operations. When the number of completed fetch operations reaches the |
| maximum fetch count (the sampling period), IBS tags the fetch operation and |
| monitors that operation until it either completes or aborts. When a tagged |
| fetch completes or aborts, a sampling interrupt is generated and an IBS fetch |
| sample is taken. An IBS fetch sample contains a timestamp, the identifier of |
| the interrupted process, the virtual fetch address, and several event flags |
| and values that describe what happened during the fetch operation. |
| </p> |
| </div> |
| <div class="sect3" title="4.8.2. IBS Op"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="ibs-op"></a>4.8.2. IBS Op</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| IBS op sampling selects, tags, and monitors macro-ops as issued from AMD64 |
| instructions. Two options are available for selecting ops for sampling: |
| </p> |
| <div class="itemizedlist"> |
| <ul class="itemizedlist" type="disc"> |
| <li class="listitem"> |
| Cycles-based selection counts CPU clock cycles. The op is tagged and monitored |
| when the count reaches a threshold (the sampling period) and a valid op is |
| available. |
| </li> |
| <li class="listitem"> |
| Dispatched op-based selection counts dispatched macro-ops. |
| When the count reaches a threshold, the next valid op is tagged and monitored. |
| </li> |
| </ul> |
| </div> |
| <p> |
| In both cases, an IBS sample is generated only if the tagged op retires. |
| Thus, IBS op event information does not measure speculative execution activity. |
| The execution stages of the pipeline monitor the tagged macro-op. When the |
| tagged macro-op retires, a sampling interrupt is generated and an IBS op |
| sample is taken. An IBS op sample contains a timestamp, the identifier of |
| the interrupted process, the virtual address of the AMD64 instruction from |
| which the op was issued, and several event flags and values that describe |
| what happened when the macro-op executed. |
| </p> |
| </div> |
| <p> |
| Enabling IBS profiling is done simply by specifying IBS performance events |
| through the "--event=" options. These events are listed in the |
| <code class="function">opcontrol --list-events</code>. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| opcontrol --event=IBS_FETCH_XXX:<count>:<um>:<kernel>:<user> |
| opcontrol --event=IBS_OP_XXX:<count>:<um>:<kernel>:<user> |
| |
| Note: * All IBS fetch event must have the same event count and unitmask, |
| as do those for IBS op. |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect2" title="4.9. Dangerous counter settings"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="misuse"></a>4.9. Dangerous counter settings</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| OProfile is a low-level profiler which allow continuous profiling with a low-overhead cost. |
| If too low a count reset value is set for a counter, the system can become overloaded with counter |
| interrupts, and seem as if the system has frozen. Whilst some validation is done, it |
| is not foolproof. |
| </p> |
| <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| This can happen as follows: When the profiler count |
| reaches zero an NMI handler is called which stores the sample values in an internal buffer, then resets the counter |
| to its original value. If the count is very low, a pending NMI can be sent before the NMI handler has |
| completed. Due to the priority of the NMI, the local APIC delivers the pending interrupt immediately after |
| completion of the previous interrupt handler, and control never returns to other parts of the system. |
| In this way the system seems to be frozen. |
| </p> |
| </div> |
| <p>If this happens, it will be impossible to bring the system back to a workable state. |
| There is no way to provide real security against this happening, other than making sure to use a reasonable value |
| for the counter reset. For example, setting <code class="constant">CPU_CLK_UNHALTED</code> event type with a ridiculously low reset count (e.g. 500) |
| is likely to freeze the system. |
| </p> |
| <p> |
| In short : <span class="command"><strong>Don't try a foolish sample count value</strong></span>. Unfortunately the definition of a foolish value |
| is really dependent on the event type - if ever in doubt, e-mail </p> |
| <div class="address"> |
| <p><code class="email"><<a class="email" href="mailto:oprofile-list@lists.sf.net">oprofile-list@lists.sf.net</a>></code>.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="chapter" title="Chapter 4. Obtaining results"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title"><a id="results"></a>Chapter 4. Obtaining results</h2> |
| </div> |
| </div> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#profile-spec">1. Profile specifications</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#profile-spec-examples">1.1. Examples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#profile-spec-details">1.2. Profile specification parameters</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#locating-and-managing-binary-images">1.3. Locating and managing binary images</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#no-results">1.4. What to do when you don't get any results</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opreport">2. Image summaries and symbol summaries (<span class="command"><strong>opreport</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-merging">2.1. Merging separate profiles</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-comparison">2.2. Side-by-side multiple results</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-callgraph">2.3. Callgraph output</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-diff">2.4. Differential profiles with <span class="command"><strong>opreport</strong></span></a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-anon">2.5. Anonymous executable mappings</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-xml">2.6. XML formatted output</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-options">2.7. Options for <span class="command"><strong>opreport</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opannotate">3. Outputting annotated source (<span class="command"><strong>opannotate</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opannotate-finding-source">3.1. Locating source files</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opannotate-details">3.2. Usage of <span class="command"><strong>opannotate</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#getting-jit-reports">4. OProfile results with JIT samples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#opgprof">5. <span class="command"><strong>gprof</strong></span>-compatible output (<span class="command"><strong>opgprof</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opgprof-details">5.1. Usage of <span class="command"><strong>opgprof</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#oparchive">6. Archiving measurements (<span class="command"><strong>oparchive</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#oparchive-details">6.1. Usage of <span class="command"><strong>oparchive</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opimport">7. Converting sample database files (<span class="command"><strong>opimport</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opimport-details">7.1. Usage of <span class="command"><strong>opimport</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| </dl> |
| </div> |
| <p> |
| OK, so the profiler has been running, but it's not much use unless we can get some data out. Fairly often, |
| OProfile does a little <span class="emphasis"><em>too</em></span> good a job of keeping overhead low, and no data reaches |
| the profiler. This can happen on lightly-loaded machines. Remember you can force a dump at any time with : |
| </p> |
| <p> |
| <span class="command"> |
| <strong>opcontrol --dump</strong> |
| </span> |
| </p> |
| <p>Remember to do this before complaining there is no profiling data ! |
| Now that we've got some data, it has to be processed. That's the job of <span class="command"><strong>opreport</strong></span>, |
| <span class="command"><strong>opannotate</strong></span>, or <span class="command"><strong>opgprof</strong></span>. |
| </p> |
| <div class="sect1" title="1. Profile specifications"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="profile-spec"></a>1. Profile specifications</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| All of the analysis tools take a <span class="emphasis"><em>profile specification</em></span>. |
| This is a set of definitions that describe which actual profiles should be |
| examined. The simplest profile specification is empty: this will match all |
| the available profile files for the current session (this is what happens |
| when you do <span class="command"><strong>opreport</strong></span>). |
| </p> |
| <p> |
| Specification parameters are of the form <code class="option">name:value[,value]</code>. |
| For example, if I wanted to get a combined symbol summary for |
| <code class="filename">/bin/myprog</code> and <code class="filename">/bin/myprog2</code>, |
| I could do <span class="command"><strong>opreport -l image:/bin/myprog,/bin/myprog2</strong></span>. |
| As a special case, you don't actually need to specify the <code class="option">image:</code> |
| part here: anything left on the command line is assumed to be an |
| <code class="option">image:</code> name. Similarly, if no <code class="option">session:</code> |
| is specified, then <code class="option">session:current</code> is assumed ("current" |
| is a special name of the current / last profiling session). |
| </p> |
| <p> |
| In addition to the comma-separated list shown above, some of the |
| specification parameters can take <span class="command"><strong>glob</strong></span>-style |
| values. For example, if I want to see image summaries for all |
| binaries profiled in <code class="filename">/usr/bin/</code>, I could do |
| <span class="command"><strong>opreport image:/usr/bin/\*</strong></span>. Note the necessity |
| to escape the special character from the shell. |
| </p> |
| <p> |
| For <span class="command"><strong>opreport</strong></span>, profile specifications can be used to |
| define two profiles, giving differential output. This is done by |
| enclosing each of the two specifications within curly braces, as shown |
| in the examples below. Any specifications outside of curly braces are |
| shared across both. |
| </p> |
| <div class="sect2" title="1.1. Examples"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="profile-spec-examples"></a>1.1. Examples</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Image summaries for all profiles with <code class="constant">DATA_MEM_REFS</code> |
| samples in the saved session called "stresstest" : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opreport session:stresstest event:DATA_MEM_REFS |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Symbol summary for the application called "test_sym53c8xx,9xx". Note the |
| escaping is necessary as <code class="option">image:</code> takes a comma-separated list. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opreport -l ./test/test_sym53c8xx\,9xx |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Image summaries for all binaries in the <code class="filename">test</code> directory, |
| excepting <code class="filename">boring-test</code> : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opreport image:./test/\* image-exclude:./test/boring-test |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Differential profile of a binary stored in two archives : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opreport -l /bin/bash { archive:./orig } { archive:./new } |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Differential profile of an archived binary with the current session : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opreport -l /bin/bash { archive:./orig } { } |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect2" title="1.2. Profile specification parameters"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="profile-spec-details"></a>1.2. Profile specification parameters</h3> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">archive:</code> |
| <span class="emphasis"> |
| <em>archivepath</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A path to an archive made with <span class="command"><strong>oparchive</strong></span>. |
| Absence of this tag, unlike others, means "the current system", |
| equivalent to specifying "archive:". |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">session:</code> |
| <span class="emphasis"> |
| <em>sessionlist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A comma-separated list of session names to resolve in. Absence of this |
| tag, unlike others, means "the current session", equivalent to |
| specifying "session:current". |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">session-exclude:</code> |
| <span class="emphasis"> |
| <em>sessionlist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A comma-separated list of sessions to exclude. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">image:</code> |
| <span class="emphasis"> |
| <em>imagelist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A comma-separated list of image names to resolve. Each entry may be relative |
| path, <span class="command"><strong>glob</strong></span>-style name, or full path, e.g.</p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opreport 'image:/usr/bin/oprofiled,*op*,./opreport'</pre> |
| </td> |
| </tr> |
| </table> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">image-exclude:</code> |
| <span class="emphasis"> |
| <em>imagelist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Same as <code class="option">image:</code>, but the matching images are excluded. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">lib-image:</code> |
| <span class="emphasis"> |
| <em>imagelist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Same as <code class="option">image:</code>, but only for images that are for |
| a particular primary binary image (namely, an application). This only |
| makes sense to use if you're using <code class="option">--separate</code>. |
| This includes kernel modules and the kernel when using |
| <code class="option">--separate=kernel</code>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">lib-image-exclude:</code> |
| <span class="emphasis"> |
| <em>imagelist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Same as <code class="option">lib-image:</code>, but the matching images |
| are excluded. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">event:</code> |
| <span class="emphasis"> |
| <em>eventlist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| The symbolic event name to match on, e.g. <code class="option">event:DATA_MEM_REFS</code>. |
| You can pass a list of events for side-by-side comparison with <span class="command"><strong>opreport</strong></span>. |
| When using the timer interrupt, the event is always "TIMER". |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">count:</code> |
| <span class="emphasis"> |
| <em>eventcountlist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| The event count to match on, e.g. <code class="option">event:DATA_MEM_REFS count:30000</code>. |
| Note that this value refers to the setting used for <span class="command"><strong>opcontrol</strong></span> |
| only, and has nothing to do with the sample counts in the profile data |
| itself. |
| You can pass a list of events for side-by-side comparison with <span class="command"><strong>opreport</strong></span>. |
| When using the timer interrupt, the count is always 0 (indicating it cannot be set). |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">unit-mask:</code> |
| <span class="emphasis"> |
| <em>masklist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| The unit mask value of the event to match on
|