|  | <?xml version="1.0" encoding="ISO-8859-1"?> | 
|  | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | 
|  | <html xmlns="http://www.w3.org/1999/xhtml"> | 
|  | <head> | 
|  | <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> | 
|  | <title>OProfile manual</title> | 
|  | <meta name="generator" content="DocBook XSL Stylesheets V1.75.2" /> | 
|  | </head> | 
|  | <body> | 
|  | <div class="book" title="OProfile manual"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h1 class="title"><a id="oprofile-guide"></a>OProfile manual</h1> | 
|  | </div> | 
|  | <div> | 
|  | <div class="authorgroup"> | 
|  | <div class="author"> | 
|  | <h3 class="author"><span class="firstname">John</span> <span class="surname">Levon</span></h3> | 
|  | <div class="affiliation"> | 
|  | <div class="address"> | 
|  | <p> | 
|  | <code class="email"><<a class="email" href="mailto:levon@movementarian.org">levon@movementarian.org</a>></code> | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div> | 
|  | <p class="copyright">Copyright © 2000-2004 Victoria University of Manchester, John Levon and others</p> | 
|  | </div> | 
|  | </div> | 
|  | <hr /> | 
|  | </div> | 
|  | <div class="toc"> | 
|  | <p> | 
|  | <b>Table of Contents</b> | 
|  | </p> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="chapter"> | 
|  | <a href="#introduction">1. Introduction</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#applications">1. Applications of OProfile</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#jitsupport">1.1. Support for dynamically compiled (JIT) code</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#requirements">2. System requirements</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#resources">3. Internet resources</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#install">4. Installation</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#uninstall">5. Uninstalling OProfile</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="chapter"> | 
|  | <a href="#overview">2. Overview</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#getting-started">1. Getting started</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#tools-overview">2. Tools summary</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="chapter"> | 
|  | <a href="#controlling">3. Controlling the profiler</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#controlling-daemon">1. Using <span class="command"><strong>opcontrol</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opcontrolexamples">1.1. Examples</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#eventspec">1.2. Specifying performance counter events</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#setup-jit">2. Setting up the JIT profiling feature</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#setup-jit-jvm">2.1. JVM instrumentation</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#oprofile-gui">3. Using <span class="command"><strong>oprof_start</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#detailed-parameters">4. Configuration details</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#hardware-counters">4.1. Hardware performance counters</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#rtc">4.2. OProfile in RTC mode</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#timer">4.3. OProfile in timer interrupt mode</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#p4">4.4. Pentium 4 support</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#ia64">4.5. Intel Itanium 2 support</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#ppc64">4.6. PowerPC64 support</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#cell-be">4.7. Cell Broadband Engine support</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#amd-ibs-support">4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#misuse">4.9. Dangerous counter settings</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="chapter"> | 
|  | <a href="#results">4. Obtaining results</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#profile-spec">1. Profile specifications</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#profile-spec-examples">1.1. Examples</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#profile-spec-details">1.2. Profile specification parameters</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#locating-and-managing-binary-images">1.3. Locating and managing binary images</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#no-results">1.4. What to do when you don't get any results</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#opreport">2. Image summaries and symbol summaries (<span class="command"><strong>opreport</strong></span>)</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-merging">2.1. Merging separate profiles</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-comparison">2.2. Side-by-side multiple results</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-callgraph">2.3. Callgraph output</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-diff">2.4. Differential profiles with <span class="command"><strong>opreport</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-anon">2.5. Anonymous executable mappings</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-xml">2.6. XML formatted output</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-options">2.7. Options for <span class="command"><strong>opreport</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#opannotate">3. Outputting annotated source (<span class="command"><strong>opannotate</strong></span>)</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opannotate-finding-source">3.1. Locating source files</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opannotate-details">3.2. Usage of <span class="command"><strong>opannotate</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#getting-jit-reports">4. OProfile results with JIT samples</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#opgprof">5. <span class="command"><strong>gprof</strong></span>-compatible output (<span class="command"><strong>opgprof</strong></span>)</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opgprof-details">5.1. Usage of <span class="command"><strong>opgprof</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#oparchive">6. Archiving measurements (<span class="command"><strong>oparchive</strong></span>)</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#oparchive-details">6.1. Usage of <span class="command"><strong>oparchive</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#opimport">7. Converting sample database files (<span class="command"><strong>opimport</strong></span>)</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opimport-details">7.1. Usage of <span class="command"><strong>opimport</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="chapter"> | 
|  | <a href="#interpreting">5. Interpreting profiling results</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#irq-latency">1. Profiling interrupt latency</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#kernel-profiling">2. Kernel profiling</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#irq-masking">2.1. Interrupt masking</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#idle">2.2. Idle time</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#kernel-modules">2.3. Profiling kernel modules</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#interpreting-callgraph">3. Interpreting call-graph profiles</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#debug-info">4. Inaccuracies in annotated source</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#effect-of-optimizations">4.1. Side effects of optimizations</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#prologues">4.2. Prologues and epilogues</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#inlined-function">4.3. Inlined functions</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#wrong-linenr-info">4.4. Inaccuracy in line number information</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#symbol-without-debug-info">5. Assembly functions</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#overlapping-symbols">6. Overlapping symbols in JITed code</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#hidden-cost">7. Other discrepancies</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="chapter"> | 
|  | <a href="#ack">6. Acknowledgments</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </div> | 
|  | <div class="chapter" title="Chapter 1. Introduction"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title"><a id="introduction"></a>Chapter 1. Introduction</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="toc"> | 
|  | <p> | 
|  | <b>Table of Contents</b> | 
|  | </p> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#applications">1. Applications of OProfile</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#jitsupport">1.1. Support for dynamically compiled (JIT) code</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#requirements">2. System requirements</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#resources">3. Internet resources</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#install">4. Installation</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#uninstall">5. Uninstalling OProfile</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </div> | 
|  | <p> | 
|  | This manual applies to OProfile version 0.9.7-rc2. | 
|  | OProfile is a profiling system for Linux 2.2/2.4/2.6 systems on a number of architectures. It is capable of profiling | 
|  | all parts of a running system, from the kernel (including modules and interrupt handlers) to shared libraries | 
|  | to binaries. It runs transparently in the background collecting information at a low overhead. These | 
|  | features make it ideal for profiling entire systems to determine bottle necks in real-world systems. | 
|  | </p> | 
|  | <p> | 
|  | Many CPUs provide "performance counters", hardware registers that can count "events"; for example, | 
|  | cache misses, or CPU cycles. OProfile provides profiles of code based on the number of these occurring events: | 
|  | repeatedly, every time a certain (configurable) number of events has occurred, the PC value is recorded. | 
|  | This information is aggregated into profiles for each binary image.</p> | 
|  | <p> | 
|  | Some hardware setups do not allow OProfile to use performance counters: in these cases, no | 
|  | events are available, and OProfile operates in timer/RTC mode, as described in later chapters. | 
|  | </p> | 
|  | <div class="sect1" title="1. Applications of OProfile"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="applications"></a>1. Applications of OProfile</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | OProfile is useful in a number of situations. You might want to use OProfile when you : | 
|  | </p> | 
|  | <div class="itemizedlist"> | 
|  | <ul class="itemizedlist" type="disc"> | 
|  | <li class="listitem"> | 
|  | <p>need low overhead</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>cannot use highly intrusive profiling methods</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>need to profile interrupt handlers</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>need to profile an application and its shared libraries</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>need to profile dynamically compiled code of supported virtual machines (see <a class="xref" href="#jitsupport" title="1.1. Support for dynamically compiled (JIT) code">Section 1.1, “Support for dynamically compiled (JIT) code”</a>)</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>need to capture the performance behaviour of entire system</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>want to examine hardware effects such as cache misses</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>want detailed source annotation</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>want instruction-level profiles</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>want call-graph profiles</p> | 
|  | </li> | 
|  | </ul> | 
|  | </div> | 
|  | <p> | 
|  | OProfile is not a panacea. OProfile might not be a complete solution when you : | 
|  | </p> | 
|  | <div class="itemizedlist"> | 
|  | <ul class="itemizedlist" type="disc"> | 
|  | <li class="listitem"> | 
|  | <p>require call graph profiles on platforms other than 2.6/x86</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>don't have root permissions</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>require 100% instruction-accurate profiles</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>need function call counts or an interstitial profiling API</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>cannot tolerate any disturbance to the system whatsoever</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>need to profile interpreted or dynamically compiled code of non-supported virtual machines</p> | 
|  | </li> | 
|  | </ul> | 
|  | </div> | 
|  | <div class="sect2" title="1.1. Support for dynamically compiled (JIT) code"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="jitsupport"></a>1.1. Support for dynamically compiled (JIT) code</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Older versions of OProfile were not capable of attributing samples to symbols from dynamically | 
|  | compiled code, i.e. "just-in-time (JIT) code". Typical JIT compilers load the JIT code into | 
|  | anonymous memory regions. OProfile reported the samples from such code, but the attribution | 
|  | provided was simply: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen">"anon: <tgid><address range>" </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Due to this limitation, it wasn't possible to profile applications executed by virtual machines (VMs) | 
|  | like the Java Virtual Machine. OProfile now contains an infrastructure to support JITed code. | 
|  | A development library is provided to allow developers | 
|  | to add support for any VM that produces dynamically compiled code (see the <span class="emphasis"><em>OProfile JIT agent | 
|  | developer guide</em></span>). | 
|  | In addition, built-in support is included for the following:</p> | 
|  | <div class="itemizedlist"> | 
|  | <ul class="itemizedlist" type="disc"> | 
|  | <li class="listitem">JVMTI agent library for Java (1.5 and higher)</li> | 
|  | <li class="listitem">JVMPI agent library for Java (1.5 and lower)</li> | 
|  | </ul> | 
|  | </div> | 
|  | <p> | 
|  | For information on how to use OProfile's JIT support, see <a class="xref" href="#setup-jit" title="2. Setting up the JIT profiling feature">Section 2, “Setting up the JIT profiling feature”</a>. | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="2. System requirements"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="requirements"></a>2. System requirements</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term">Linux kernel 2.2/2.4/2.6</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | OProfile uses a kernel module that can be compiled for | 
|  | 2.2.11 or later and 2.4. 2.4.10 or above is required if you use the | 
|  | boot-time kernel option <code class="option">nosmp</code>.  2.6 kernels are supported with the in-kernel | 
|  | OProfile driver. Note that only 32-bit x86 and IA64 are supported on 2.2/2.4 kernels. | 
|  | </p> | 
|  | <p> | 
|  | 2.6 kernels are strongly recommended. Under 2.4, OProfile may cause system crashes if power | 
|  | management is used, or the BIOS does not correctly deal with local APICs. | 
|  | </p> | 
|  | <p> | 
|  | To use OProfile's JIT support, a kernel version 2.6.13 or later is required. | 
|  | In earlier kernel versions, the anonymous memory regions are not reported to OProfile and results | 
|  | in profiling reports without any samples in these regions. | 
|  | </p> | 
|  | <p> | 
|  | PPC64 processors (Power4/Power5/PPC970, etc.) require a recent (> 2.6.5) kernel with the line | 
|  | <code class="constant">#define PV_970</code> present in <code class="filename">include/asm-ppc64/processor.h</code>. | 
|  |  | 
|  | </p> | 
|  | <p> | 
|  | Profiling the Cell Broadband Engine PowerPC Processing Element (PPE) requires a kernel version | 
|  | of 2.6.18 or more recent. | 
|  | Profiling the Cell Broadband Engine Synergistic Processing Element (SPE) requires a kernel version | 
|  | of 2.6.22 or more recent.  Additionally, full support of SPE profiling requires a BFD library | 
|  | from binutils code dated January 2007 or later.  To ensure the proper BFD support exists, run | 
|  | the <code class="code">configure</code> utility with <code class="code">--with-target=cell-be</code>. | 
|  |  | 
|  | Profiling the Cell Broadband Engine using SPU events requires a kernel version of 2.6.29-rc1 | 
|  | or  more recent. | 
|  |  | 
|  | </p> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>Attempting to profile SPEs with kernel versions older than 2.6.22 may cause the | 
|  | system to crash.</div> | 
|  | <p> | 
|  | </p> | 
|  | <p> | 
|  | Instruction-Based Sampling (IBS) profile on AMD family10h processors requires | 
|  | kernel version 2.6.28-rc2 or later. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">modutils 2.4.6 or above</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | You should have installed modutils 2.4.6 or higher (in fact earlier versions work well in almost all | 
|  | cases). | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">Supported architecture</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | For Intel IA32, a CPU with either a P6 generation or Pentium 4 core is | 
|  | required. In marketing terms this translates to anything | 
|  | between an Intel Pentium Pro (not Pentium Classics) and | 
|  | a Pentium 4 / Xeon, including all Celerons.  The AMD | 
|  | Athlon, Opteron, Phenom, and Turion CPUs are also supported.  Other IA32 | 
|  | CPU types only support the RTC mode of OProfile; please | 
|  | see later in this manual for details.  Hyper-threaded Pentium IVs | 
|  | are not supported in 2.4. For 2.4 kernels, the Intel | 
|  | IA-64 CPUs are also supported. For 2.6 kernels, there is additionally | 
|  | support for Alpha processors, MIPS, ARM, x86-64, sparc64, ppc64, AVR32, and, | 
|  | in timer mode, PA-RISC and s390. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">Uniprocessor or SMP</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | SMP machines are fully supported. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">Required libraries</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | These libraries are required : <code class="filename">popt</code>, <code class="filename">bfd</code>, | 
|  | <code class="filename">liberty</code> (debian users: libiberty is provided in binutils-dev package), <code class="filename">dl</code>, | 
|  | plus the standard C++ libraries. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">Required user account</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | For secure processing of sample data from JIT virtual machines (e.g., Java), | 
|  | the special user account "oprofile" must exist on the system.  The 'configure' | 
|  | and 'make install' operations will print warning messages if this | 
|  | account is not found.  If you intend to profile JITed code, you must create | 
|  | a group account named 'oprofile' and then create the 'oprofile' user account, | 
|  | setting the default group to 'oprofile'.  A runtime error message is printed to | 
|  | the oprofile daemon log when processing JIT samples if this special user | 
|  | account cannot be found. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">OProfile GUI</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | The use of the GUI to start the profiler requires the <code class="filename">Qt</code> library. | 
|  | Either <code class="filename">Qt 3</code> or <code class="filename">Qt 4</code> should work. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <acronym class="acronym">ELF</acronym> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Probably not too strenuous a requirement, but older <acronym class="acronym">A.OUT</acronym> binaries/libraries are not supported. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">K&R coding style</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | OK, so it's not really a requirement, but I wish it was... | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="3. Internet resources"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="resources"></a>3. Internet resources</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term">Web page</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | There is a web page (which you may be reading now) at | 
|  | <a class="ulink" href="http://oprofile.sf.net/">http://oprofile.sf.net/</a>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">Download</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | You can download a source tarball or check out code from | 
|  | the code repository at the sourceforge page, | 
|  | <a class="ulink" href="http://sf.net/projects/oprofile/">http://sf.net/projects/oprofile/</a>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">Mailing list</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | There is a low-traffic OProfile-specific mailing list, details at | 
|  | <a class="ulink" href="http://sf.net/mail/?group_id=16191">http://sf.net/mail/?group_id=16191</a>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">Bug tracker</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | There is a bug tracker for OProfile at SourceForge, | 
|  | <a class="ulink" href="http://sf.net/tracker/?group_id=16191&atid=116191">http://sf.net/tracker/?group_id=16191&atid=116191</a>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">IRC channel</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Several OProfile developers and users sometimes hang out on channel <span class="command"><strong>#oprofile</strong></span> | 
|  | on the <a class="ulink" href="http://oftc.net">OFTC</a> network. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="4. Installation"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="install"></a>4. Installation</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | First you need to build OProfile and install it. <span class="command"><strong>./configure</strong></span>, <span class="command"><strong>make</strong></span>, <span class="command"><strong>make install</strong></span> | 
|  | is often all you need, but note these arguments to <span class="command"><strong>./configure</strong></span> : | 
|  | </p> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--with-linux</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Use this option to specify the location of the kernel source tree you wish | 
|  | to compile against. The kernel module is built against this source and | 
|  | will only work with a running kernel built from the same source with | 
|  | exact same options, so it is important you specify this option if you need | 
|  | to. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--with-java</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Use this option if you need to profile Java applications.  Also, see | 
|  | <a class="xref" href="#requirements" title="2. System requirements">Section 2, “System requirements”</a>, "Required user account".  This option | 
|  | is used to specify the location of the Java Development Kit (JDK) | 
|  | source tree you wish to use. This is necessary to get the interface description | 
|  | of the JVMPI (or JVMTI) interface to compile the JIT support code successfully. | 
|  | </p> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> | 
|  | <h3 class="title">Note</h3> | 
|  | <p> | 
|  | The Java Runtime Environment (JRE) does not include the development | 
|  | files that are required to compile the JIT support code, so the full | 
|  | JDK must be installed in order to use this option. | 
|  | </p> | 
|  | </div> | 
|  | <p> | 
|  | By default, the Oprofile JIT support libraries will be installed in | 
|  | <code class="filename"><oprof_install_dir>/lib/oprofile</code>.  To build | 
|  | and install OProfile and the JIT support libraries as 64-bit, you can | 
|  | do something like the following: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # CFLAGS="-m64" CXXFLAGS="-m64" ./configure \ | 
|  | --with-kernel-support --with-java={my_jdk_installdir} \ | 
|  | --libdir=/usr/local/lib64 | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | </p> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> | 
|  | <h3 class="title">Note</h3> | 
|  | <p> | 
|  | If you encounter errors building 64-bit, you should | 
|  | install libtool 1.5.26 or later since that release of | 
|  | libtool fixes known problems for certain platforms. | 
|  | If you install libtool into a non-standard location, | 
|  | you'll need to edit the invocation of 'aclocal' in | 
|  | OProfile's autogen.sh as follows (assume an install | 
|  | location of /usr/local): | 
|  | </p> | 
|  | <p> | 
|  | <code class="code">aclocal -I m4 -I /usr/local/share/aclocal</code> | 
|  | </p> | 
|  | </div> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--with-kernel-support</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Use this option with 2.6 and above kernels to indicate the | 
|  | kernel provides the OProfile device driver. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--with-qt-dir/includes/libraries</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Specify the location of Qt headers and libraries. It defaults to searching in | 
|  | <code class="constant">$QTDIR</code> if these are not specified. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <a id="disable-werror"></a> | 
|  | <span class="term"> | 
|  | <code class="option">--disable-werror</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Development versions of OProfile build by | 
|  | default with <code class="option">-Werror</code>. This option turns | 
|  | <code class="option">-Werror</code> off. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <a id="disable-optimization"></a> | 
|  | <span class="term"> | 
|  | <code class="option">--disable-optimization</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Disable the <code class="option">-O2</code> compiler flag | 
|  | (useful if you discover an OProfile bug and want to give a useful | 
|  | back-trace etc.) | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | <p> | 
|  | You'll need to have a configured kernel source for the current kernel | 
|  | to build the module for 2.4 kernels.  Since all distributions provide different kernels it's unlikely the running kernel match the configured source | 
|  | you installed. The safest way is to recompile your own kernel, run it and compile oprofile. It is also recommended that if you have a | 
|  | uniprocessor machine, you enable the local APIC / IO_APIC support for | 
|  | your kernel (this is automatically enabled for SMP kernels). With many BIOS, kernel >= 2.6.9 and UP kernel it's not sufficient to enable the local APIC you must also turn it on explicitly at boot time by providing "lapic" option to the kernel. On | 
|  | machines with power management, such as laptops, the power management | 
|  | must be turned off when using OProfile with 2.4 kernels. The power management software | 
|  | in the BIOS cannot handle the non-maskable interrupts (NMIs) used by | 
|  | OProfile for data collection. If you use the NMI watchdog, be aware that | 
|  | the watchdog is disabled when profiling starts, and not re-enabled until the | 
|  | OProfile module is removed (or, in 2.6, when OProfile is not running). If you compile OProfile for | 
|  | a 2.2 kernel you must be root to compile the module. If you are using | 
|  | 2.6 kernels or higher, you do not need kernel source, as long as the | 
|  | OProfile driver is enabled; additionally, you should not need to disable | 
|  | power management. | 
|  | </p> | 
|  | <p> | 
|  | Please note that you must save or have available the <code class="filename">vmlinux</code> file | 
|  | generated during a kernel compile, as OProfile needs it (you can use | 
|  | <code class="option">--no-vmlinux</code>, but this will prevent kernel profiling). | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect1" title="5. Uninstalling OProfile"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="uninstall"></a>5. Uninstalling OProfile</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | You must have the source tree available to uninstall OProfile; a <span class="command"><strong>make uninstall</strong></span> will | 
|  | remove all installed files except your configuration file in the directory <code class="filename">~/.oprofile</code>. | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | <div class="chapter" title="Chapter 2. Overview"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title"><a id="overview"></a>Chapter 2. Overview</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="toc"> | 
|  | <p> | 
|  | <b>Table of Contents</b> | 
|  | </p> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#getting-started">1. Getting started</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#tools-overview">2. Tools summary</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </div> | 
|  | <div class="sect1" title="1. Getting started"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="getting-started"></a>1. Getting started</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Before you can use OProfile, you must set it up. The minimum setup required for this | 
|  | is to tell OProfile where the <code class="filename">vmlinux</code> file corresponding to the | 
|  | running kernel is, for example : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen">opcontrol --vmlinux=/boot/vmlinux-`uname -r`</pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | If you don't want to profile the kernel itself, | 
|  | you can tell OProfile you don't have a <code class="filename">vmlinux</code> file : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen">opcontrol --no-vmlinux</pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Now we are ready to start the daemon (<span class="command"><strong>oprofiled</strong></span>) which collects | 
|  | the profile data : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen">opcontrol --start</pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | When I want to stop profiling, I can do so with : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen">opcontrol --shutdown</pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Note that unlike <span class="command"><strong>gprof</strong></span>, no instrumentation (<code class="option">-pg</code> | 
|  | and <code class="option">-a</code> options to <span class="command"><strong>gcc</strong></span>) | 
|  | is necessary. | 
|  | </p> | 
|  | <p> | 
|  | Periodically (or on <span class="command"><strong>opcontrol --shutdown</strong></span> or <span class="command"><strong>opcontrol --dump</strong></span>) | 
|  | the profile data is written out into the $SESSION_DIR/samples directory (by default at <code class="filename">/var/lib/oprofile/samples</code>). | 
|  | These profile files cover shared libraries, applications, the kernel (vmlinux), and kernel modules. | 
|  | You can clear the profile data (at any time) with <span class="command"><strong>opcontrol --reset</strong></span>. | 
|  | </p> | 
|  | <p> | 
|  | To place these sample database files in a specific directory instead of the default location (<code class="filename">/var/lib/oprofile</code>) use the <code class="option">--session-dir=dir</code> option. You must also specify the <code class="option">--session-dir</code> to tell the tools to continue using this directory. (In the future, we should allow this to be specified in an environment variable.) : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen">opcontrol --no-vmlinux --session-dir=/home/me/tmpsession</pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen">opcontrol --start --session-dir=/home/me/tmpsession</pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | You can get summaries of this data in a number of ways at any time. To get a summary of | 
|  | data across the entire system for all of these profiles, you can do : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen">opreport [--session-dir=dir]</pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Or to get a more detailed summary, for a particular image, you can do something like : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen">opreport -l /boot/vmlinux-`uname -r`</pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | There are also a number of other ways of presenting the data, as described later in this manual. | 
|  | Note that OProfile will choose a default profiling setup for you. However, there are a number | 
|  | of options you can pass to <span class="command"><strong>opcontrol</strong></span> if you need to change something, | 
|  | also detailed later. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect1" title="2. Tools summary"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="tools-overview"></a>2. Tools summary</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | This section gives a brief description of the available OProfile utilities and their purpose. | 
|  | </p> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="filename">ophelp</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | This utility lists the available events and short descriptions. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="filename">opcontrol</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Used for controlling the OProfile data collection, discussed in <a class="xref" href="#controlling" title="Chapter 3. Controlling the profiler">Chapter 3, <i>Controlling the profiler</i></a>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="filename">agent libraries</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Used by virtual machines (like the Java VM) to record information about JITed code being profiled. See <a class="xref" href="#setup-jit" title="2. Setting up the JIT profiling feature">Section 2, “Setting up the JIT profiling feature”</a>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="filename">opreport</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | This is the main tool for retrieving useful profile data, described in | 
|  | <a class="xref" href="#opreport" title="2. Image summaries and symbol summaries (opreport)">Section 2, “Image summaries and symbol summaries (<span class="command"><strong>opreport</strong></span>)”</a>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="filename">opannotate</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | This utility can be used to produce annotated source, assembly or mixed source/assembly. | 
|  | Source level annotation is available only if the application was compiled with | 
|  | debugging symbols. See <a class="xref" href="#opannotate" title="3. Outputting annotated source (opannotate)">Section 3, “Outputting annotated source (<span class="command"><strong>opannotate</strong></span>)”</a>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="filename">opgprof</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | This utility can output gprof-style data files for a binary, for use with | 
|  | <span class="command"><strong>gprof -p</strong></span>. See <a class="xref" href="#opgprof" title="5. gprof-compatible output (opgprof)">Section 5, “<span class="command"><strong>gprof</strong></span>-compatible output (<span class="command"><strong>opgprof</strong></span>)”</a>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="filename">oparchive</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | This utility can be used to collect executables, debuginfo, | 
|  | and sample files and copy the files into an archive. | 
|  | The archive is self-contained and can be moved to another | 
|  | machine for further analysis. | 
|  | See <a class="xref" href="#oparchive" title="6. Archiving measurements (oparchive)">Section 6, “Archiving measurements (<span class="command"><strong>oparchive</strong></span>)”</a>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="filename">opimport</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | This utility converts sample database files from a foreign binary format (abi) to | 
|  | the native format. This is useful only when moving sample files between hosts, | 
|  | for analysis on platforms other than the one used for collection. | 
|  | See <a class="xref" href="#opimport" title="7. Converting sample database files (opimport)">Section 7, “Converting sample database files (<span class="command"><strong>opimport</strong></span>)”</a>. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="chapter" title="Chapter 3. Controlling the profiler"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title"><a id="controlling"></a>Chapter 3. Controlling the profiler</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="toc"> | 
|  | <p> | 
|  | <b>Table of Contents</b> | 
|  | </p> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#controlling-daemon">1. Using <span class="command"><strong>opcontrol</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opcontrolexamples">1.1. Examples</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#eventspec">1.2. Specifying performance counter events</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#setup-jit">2. Setting up the JIT profiling feature</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#setup-jit-jvm">2.1. JVM instrumentation</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#oprofile-gui">3. Using <span class="command"><strong>oprof_start</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#detailed-parameters">4. Configuration details</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#hardware-counters">4.1. Hardware performance counters</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#rtc">4.2. OProfile in RTC mode</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#timer">4.3. OProfile in timer interrupt mode</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#p4">4.4. Pentium 4 support</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#ia64">4.5. Intel Itanium 2 support</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#ppc64">4.6. PowerPC64 support</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#cell-be">4.7. Cell Broadband Engine support</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#amd-ibs-support">4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#misuse">4.9. Dangerous counter settings</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | <div class="sect1" title="1. Using opcontrol"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="controlling-daemon"></a>1. Using <span class="command"><strong>opcontrol</strong></span></h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | In this section we describe the configuration and control of the profiling system | 
|  | with opcontrol in more depth. | 
|  | The <span class="command"><strong>opcontrol</strong></span> script has a default setup, but you | 
|  | can alter this with the options given below. In particular, | 
|  | if your hardware supports performance counters, you can configure them. | 
|  | There are a number of counters (for example, counter 0 and counter 1 | 
|  | on the Pentium III). Each of these counters can be programmed with | 
|  | an event to count, such as cache misses or MMX operations. The event | 
|  | chosen for each counter is reflected in the profile data collected | 
|  | by OProfile: functions and binaries at the top of the profiles reflect | 
|  | that most of the chosen events happened within that code. | 
|  | </p> | 
|  | <p> | 
|  | Additionally, each counter has a "count" value: this corresponds to how | 
|  | detailed the profile is. The lower the value, the more frequently profile | 
|  | samples are taken. A counter can choose to sample only kernel code, user-space code, | 
|  | or both (both is the default). Finally, some events have a "unit mask" | 
|  | - this is a value that further restricts the types of event that are counted. | 
|  | The event types and unit masks for your CPU are listed by <span class="command"><strong>opcontrol | 
|  | --list-events</strong></span>. | 
|  | </p> | 
|  | <p> | 
|  | The <span class="command"><strong>opcontrol</strong></span> script provides the following actions : | 
|  | </p> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--init</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Loads the OProfile module if required and makes the OProfile driver | 
|  | interface available. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--setup</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Followed by list arguments for profiling set up. List of arguments | 
|  | saved in <code class="filename">/root/.oprofile/daemonrc</code>. | 
|  | Giving this option is not necessary; you can just directly pass one | 
|  | of the setup options, e.g. <span class="command"><strong>opcontrol --no-vmlinux</strong></span>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--status</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show configuration information. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--start-daemon</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Start the oprofile daemon without starting actual profiling. The profiling | 
|  | can then be started using <code class="option">--start</code>. This is useful for avoiding | 
|  | measuring the cost of daemon startup, as <code class="option">--start</code> is a simple | 
|  | write to a file in oprofilefs. Not available in 2.2/2.4 kernels. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--start</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Start data collection with either arguments provided by <code class="option">--setup</code> | 
|  | or information saved in <code class="filename">/root/.oprofile/daemonrc</code>. Specifying | 
|  | the addition <code class="option">--verbose</code> makes the daemon generate lots of debug data | 
|  | whilst it is running. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--dump</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Force a flush of the collected profiling data to the daemon. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--stop</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Stop data collection (this separate step is not possible with 2.2 or 2.4 kernels). | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--shutdown</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Stop data collection and kill the daemon. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--reset</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Clears out data from current session, but leaves saved sessions. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--save=</code>session_name</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Save data from current session to session_name. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--deinit</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Shuts down daemon. Unload the OProfile module and oprofilefs. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--list-events</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | List event types and unit masks. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--help</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Generate usage messages. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | <p> | 
|  | There are a number of possible settings, of which, only | 
|  | <code class="option">--vmlinux</code> (or <code class="option">--no-vmlinux</code>) | 
|  | is required. These settings are stored in <code class="filename">~/.oprofile/daemonrc</code>. | 
|  | </p> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--buffer-size=</code>num</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Number of samples in kernel buffer. When using a 2.6 kernel | 
|  | buffer watershed need to be tweaked when changing this value. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--buffer-watershed=</code>num</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Set kernel buffer watershed to num samples (2.6 only). When it'll remain only | 
|  | buffer-size - buffer-watershed free entry in the kernel buffer data will be | 
|  | flushed to daemon, most usefull value are in the range [0.25 - 0.5] * buffer-size. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--cpu-buffer-size=</code>num</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Number of samples in kernel per-cpu buffer (2.6 only). If you | 
|  | profile at high rate it can help to increase this if the log | 
|  | file show excessive count of sample lost cpu buffer overflow. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--event=</code>[eventspec]</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Use the given performance counter event to profile. | 
|  | See <a class="xref" href="#eventspec" title="1.2. Specifying performance counter events">Section 1.2, “Specifying performance counter events”</a> below. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--session-dir=</code>dir_path</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Create/use sample database out of directory <code class="filename">dir_path</code> instead of | 
|  | the default location (/var/lib/oprofile). | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--separate=</code>[none,lib,kernel,thread,cpu,all]</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | By default, every profile is stored in a single file. Thus, for example, | 
|  | samples in the C library are all accredited to the <code class="filename">/lib/libc.o</code> | 
|  | profile. However, you choose to create separate sample files by specifying | 
|  | one of the below options. | 
|  | </p> | 
|  | <div class="informaltable"> | 
|  | <table border="1"> | 
|  | <colgroup> | 
|  | <col /> | 
|  | <col /> | 
|  | </colgroup> | 
|  | <tbody> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">none</code> | 
|  | </td> | 
|  | <td>No profile separation (default)</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">lib</code> | 
|  | </td> | 
|  | <td>Create per-application profiles for libraries</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">kernel</code> | 
|  | </td> | 
|  | <td>Create per-application profiles for the kernel and kernel modules</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">thread</code> | 
|  | </td> | 
|  | <td>Create profiles for each thread and each task</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">cpu</code> | 
|  | </td> | 
|  | <td>Create profiles for each CPU</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">all</code> | 
|  | </td> | 
|  | <td>All of the above options</td> | 
|  | </tr> | 
|  | </tbody> | 
|  | </table> | 
|  | </div> | 
|  | <p> | 
|  | Note  that <code class="option">--separate=kernel</code> also turns on <code class="option">--separate=lib</code>. | 
|  |  | 
|  | When using <code class="option">--separate=kernel</code>, samples in hardware interrupts, soft-irqs, or other | 
|  | asynchronous kernel contexts are credited to the task currently running. This means you will see | 
|  | seemingly nonsense profiles such as <code class="filename">/bin/bash</code> showing samples for the PPP modules, | 
|  | etc. | 
|  | </p> | 
|  | <p> | 
|  | On 2.2/2.4 only kernel threads already started when profiling begins are correctly profiled; | 
|  | newly started kernel thread samples are credited to the vmlinux (kernel) profile. | 
|  | </p> | 
|  | <p> | 
|  | Using <code class="option">--separate=thread</code> creates a lot | 
|  | of sample files if you leave OProfile running for a while; it's most | 
|  | useful when used for short sessions, or when using image filtering. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--callgraph=</code>#depth</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Enable call-graph sample collection with a maximum depth. Use 0 to disable | 
|  | callgraph profiling.  NOTE: Callgraph support is available on a limited | 
|  | number of platforms at this time; for example: | 
|  | </p> | 
|  | <p> | 
|  | </p> | 
|  | <div class="itemizedlist"> | 
|  | <ul class="itemizedlist" type="disc"> | 
|  | <li class="listitem"> | 
|  | <p>x86 with recent 2.6 kernel</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>ARM with recent 2.6 kernel</p> | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | <p>PowerPC with 2.6.17 kernel</p> | 
|  | </li> | 
|  | </ul> | 
|  | </div> | 
|  | <p> | 
|  | </p> | 
|  | <p> | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--image=</code>image,[images]|"all"</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Image filtering. If you specify one or more absolute | 
|  | paths to binaries, OProfile will only produce profile results for those | 
|  | binary images. This is useful for restricting the sometimes voluminous | 
|  | output you may get otherwise, especially with | 
|  | <code class="option">--separate=thread</code>. Note that if you are using | 
|  | <code class="option">--separate=lib</code> or | 
|  | <code class="option">--separate=kernel</code>, then if you specification an | 
|  | application binary, the shared libraries and kernel code | 
|  | <span class="emphasis"><em>are</em></span> included. Specify the value | 
|  | "all" to profile everything (the default). | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--vmlinux=</code>file</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | vmlinux kernel image. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--no-vmlinux</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Use this when you don't have a kernel vmlinux file, and you don't want | 
|  | to profile the kernel. This still counts the total number of kernel samples, | 
|  | but can't give symbol-based results for the kernel or any modules. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | <div class="sect2" title="1.1. Examples"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="opcontrolexamples"></a>1.1. Examples</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect3" title="1.1.1. Intel performance counter setup"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h4 class="title"><a id="examplesperfctr"></a>1.1.1. Intel performance counter setup</h4> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Here, we have a Pentium III running at 800MHz, and we want to look at where data memory | 
|  | references are happening most, and also get results for CPU time. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # opcontrol --event=CPU_CLK_UNHALTED:400000 --event=DATA_MEM_REFS:10000 | 
|  | # opcontrol --vmlinux=/boot/2.6.0/vmlinux | 
|  | # opcontrol --start | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | </div> | 
|  | <div class="sect3" title="1.1.2. RTC mode"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h4 class="title"><a id="examplesrtc"></a>1.1.2. RTC mode</h4> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Here, we have an Intel laptop without support for performance counters, running on 2.4 kernels. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # ophelp -r | 
|  | CPU with RTC device | 
|  | # opcontrol --vmlinux=/boot/2.4.13/vmlinux --event=RTC_INTERRUPTS:1024 | 
|  | # opcontrol --start | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | </div> | 
|  | <div class="sect3" title="1.1.3. Starting the daemon separately"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h4 class="title"><a id="examplesstartdaemon"></a>1.1.3. Starting the daemon separately</h4> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | If we're running 2.6 kernels, we can use <code class="option">--start-daemon</code> to avoid | 
|  | the profiler startup affecting results. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # opcontrol --vmlinux=/boot/2.6.0/vmlinux | 
|  | # opcontrol --start-daemon | 
|  | # my_favourite_benchmark --init | 
|  | # opcontrol --start ; my_favourite_benchmark --run ; opcontrol --stop | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | </div> | 
|  | <div class="sect3" title="1.1.4. Separate profiles for libraries and the kernel"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h4 class="title"><a id="exampleseparate"></a>1.1.4. Separate profiles for libraries and the kernel</h4> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Here, we want to see a profile of the OProfile daemon itself, including when | 
|  | it was running inside the kernel driver, and its use of shared libraries. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # opcontrol --separate=kernel --vmlinux=/boot/2.6.0/vmlinux | 
|  | # opcontrol --start | 
|  | # my_favourite_stress_test --run | 
|  | # opreport -l -p /lib/modules/2.6.0/kernel /usr/local/bin/oprofiled | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | </div> | 
|  | <div class="sect3" title="1.1.5. Profiling sessions"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h4 class="title"><a id="examplessessions"></a>1.1.5. Profiling sessions</h4> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | It can often be useful to split up profiling data into several different | 
|  | time periods. For example, you may want to collect data on an application's | 
|  | startup separately from the normal runtime data. You can use the simple | 
|  | command <span class="command"><strong>opcontrol --save</strong></span> to do this. For example : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # opcontrol --save=blah | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | will create a sub-directory in <code class="filename">$SESSION_DIR/samples</code> containing the samples | 
|  | up to that point (the current session's sample files are moved into this | 
|  | directory). You can then pass this session name as a parameter to the post-profiling | 
|  | analysis tools, to only get data up to the point you named the | 
|  | session. If you do not want to save a session, you can do | 
|  | <span class="command"><strong>rm -rf $SESSION_DIR/samples/sessionname</strong></span> or, for the | 
|  | current session, <span class="command"><strong>opcontrol --reset</strong></span>. | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect2" title="1.2. Specifying performance counter events"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="eventspec"></a>1.2. Specifying performance counter events</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The <code class="option">--event</code> option to <span class="command"><strong>opcontrol</strong></span> | 
|  | takes a specification that indicates how the details of each | 
|  | hardware performance counter should be setup. If you want to | 
|  | revert to OProfile's default setting (<code class="option">--event</code> | 
|  | is strictly optional), use <code class="option">--event=default</code>. Use of this | 
|  | option over-rides all previous event selections. | 
|  | </p> | 
|  | <p> | 
|  | You can pass multiple event specifications. OProfile will allocate | 
|  | hardware counters as necessary. Note that some combinations are not | 
|  | allowed by the CPU; running <span class="command"><strong>opcontrol --list-events</strong></span> gives the details | 
|  | of each event. The event specification is a colon-separated string | 
|  | of the form <code class="option"><span class="emphasis"><em>name</em></span>:<span class="emphasis"><em>count</em></span>:<span class="emphasis"><em>unitmask</em></span>:<span class="emphasis"><em>kernel</em></span>:<span class="emphasis"><em>user</em></span></code> as described in this table: | 
|  | </p> | 
|  | <div class="informaltable"> | 
|  | <table border="1"> | 
|  | <colgroup> | 
|  | <col /> | 
|  | <col /> | 
|  | </colgroup> | 
|  | <tbody> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">name</code> | 
|  | </td> | 
|  | <td>The symbolic event name, e.g. <code class="constant">CPU_CLK_UNHALTED</code></td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">count</code> | 
|  | </td> | 
|  | <td>The counter reset value, e.g. 100000</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">unitmask</code> | 
|  | </td> | 
|  | <td>The unit mask, as given in the events list: e.g. 0x0f; or a symbolic name as | 
|  | given by the first word of the description (only valid for unit masks having an "extra:" parameter)</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">kernel</code> | 
|  | </td> | 
|  | <td>Whether to profile kernel code</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td> | 
|  | <code class="option">user</code> | 
|  | </td> | 
|  | <td>Whether to profile userspace code</td> | 
|  | </tr> | 
|  | </tbody> | 
|  | </table> | 
|  | </div> | 
|  | <p> | 
|  | The last three values are optional, if you omit them (e.g. <code class="option">--event=DATA_MEM_REFS:30000</code>), | 
|  | they will be set to the default values (a unit mask of 0, and profiling both kernel and | 
|  | userspace code). Note that some events require a unit mask. | 
|  | </p> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> | 
|  | <h3 class="title">Note</h3> | 
|  | <p> | 
|  | For the PowerPC platforms, all events specified must be in the same group; i.e., the group number | 
|  | appended to the event name (e.g. <code class="constant"><<span class="emphasis"><em>some-event-name</em></span>>_GRP9</code>) must be the same. | 
|  | </p> | 
|  | </div> | 
|  | <p> | 
|  | If OProfile is using RTC mode, and you want to alter the default counter value, | 
|  | you can use something like <code class="option">--event=RTC_INTERRUPTS:2048</code>. Note the last | 
|  | three values here are ignored. | 
|  | If OProfile is using timer-interrupt mode, there is no configuration possible. | 
|  | </p> | 
|  | <p> | 
|  | The table below lists the events selected by default | 
|  | (<code class="option">--event=default</code>) for the various computer architectures: | 
|  | </p> | 
|  | <div class="informaltable"> | 
|  | <table border="1"> | 
|  | <colgroup> | 
|  | <col /> | 
|  | <col /> | 
|  | <col /> | 
|  | </colgroup> | 
|  | <tbody> | 
|  | <tr> | 
|  | <td>Processor</td> | 
|  | <td>cpu_type</td> | 
|  | <td>Default event</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Alpha EV4</td> | 
|  | <td>alpha/ev4</td> | 
|  | <td>CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Alpha EV5</td> | 
|  | <td>alpha/ev5</td> | 
|  | <td>CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Alpha PCA56</td> | 
|  | <td>alpha/pca56</td> | 
|  | <td>CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Alpha EV6</td> | 
|  | <td>alpha/ev6</td> | 
|  | <td>CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Alpha EV67</td> | 
|  | <td>alpha/ev67</td> | 
|  | <td>CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>ARM/XScale PMU1</td> | 
|  | <td>arm/xscale1</td> | 
|  | <td>CPU_CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>ARM/XScale PMU2</td> | 
|  | <td>arm/xscale2</td> | 
|  | <td>CPU_CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>ARM/MPCore</td> | 
|  | <td>arm/mpcore</td> | 
|  | <td>CPU_CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>AVR32</td> | 
|  | <td>avr32</td> | 
|  | <td>CPU_CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Athlon</td> | 
|  | <td>i386/athlon</td> | 
|  | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Pentium Pro</td> | 
|  | <td>i386/ppro</td> | 
|  | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Pentium II</td> | 
|  | <td>i386/pii</td> | 
|  | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Pentium III</td> | 
|  | <td>i386/piii</td> | 
|  | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Pentium M (P6 core)</td> | 
|  | <td>i386/p6_mobile</td> | 
|  | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Pentium 4 (non-HT)</td> | 
|  | <td>i386/p4</td> | 
|  | <td>GLOBAL_POWER_EVENTS:100000:1:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Pentium 4 (HT)</td> | 
|  | <td>i386/p4-ht</td> | 
|  | <td>GLOBAL_POWER_EVENTS:100000:1:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Hammer</td> | 
|  | <td>x86-64/hammer</td> | 
|  | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Family10h</td> | 
|  | <td>x86-64/family10</td> | 
|  | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Family11h</td> | 
|  | <td>x86-64/family11h</td> | 
|  | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Itanium</td> | 
|  | <td>ia64/itanium</td> | 
|  | <td>CPU_CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>Itanium 2</td> | 
|  | <td>ia64/itanium2</td> | 
|  | <td>CPU_CYCLES:100000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>TIMER_INT</td> | 
|  | <td>timer</td> | 
|  | <td>None selectable</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>IBM iseries</td> | 
|  | <td>PowerPC 4/5/970</td> | 
|  | <td>CYCLES:10000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>IBM pseries</td> | 
|  | <td>PowerPC 4/5/970/Cell</td> | 
|  | <td>CYCLES:10000:0:1:1</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>IBM s390</td> | 
|  | <td>timer</td> | 
|  | <td>None selectable</td> | 
|  | </tr> | 
|  | <tr> | 
|  | <td>IBM s390x</td> | 
|  | <td>timer</td> | 
|  | <td>None selectable</td> | 
|  | </tr> | 
|  | </tbody> | 
|  | </table> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="2. Setting up the JIT profiling feature"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="setup-jit"></a>2. Setting up the JIT profiling feature</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | To gather information about JITed code from a virtual machine, | 
|  | it needs to be instrumented with an agent library. We use the | 
|  | agent libraries for Java in the following example. To use the | 
|  | Java profiling feature, you must build OProfile with the "--with-java" option | 
|  | (<a class="xref" href="#install" title="4. Installation">Section 4, “Installation”</a>). | 
|  |  | 
|  | </p> | 
|  | <div class="sect2" title="2.1. JVM instrumentation"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="setup-jit-jvm"></a>2.1. JVM instrumentation</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Add this to the startup parameters of the JVM (for JVMTI): | 
|  |  | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-agentpath:<libdir>/libjvmti_oprofile.so[=<options>]</code> </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | or | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-agentlib:jvmti_oprofile[=<options>]</code> </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | </p> | 
|  | <p> | 
|  | The JVMPI agent implementation is enabled with the command line option | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-Xrunjvmpi_oprofile[:<options>]</code> </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | </p> | 
|  | <p> | 
|  | Currently, there is just one option available -- <code class="option">debug</code>. For JVMPI, | 
|  | the convention for specifying an option is <code class="option">option_name=[yes|no]</code>. | 
|  | For JVMTI, the option specification is simply the option name, implying | 
|  | "yes"; no option specified implies "no". | 
|  | </p> | 
|  | <p> | 
|  | The agent library (installed in <code class="filename"><oprof_install_dir>/lib/oprofile</code>) | 
|  | needs to be in the library search path (e.g. add the library directory | 
|  | to <code class="constant">LD_LIBRARY_PATH</code>). If the command line of | 
|  | the JVM is not accessible, it may be buried within shell scripts or a | 
|  | launcher program. It may also be possible to set an environment variable to add | 
|  | the instrumentation. | 
|  | For Sun JVMs this is <code class="constant">JAVA_TOOL_OPTIONS</code>. Please check | 
|  | your JVM documentation for | 
|  | further information on the agent startup options. | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="3. Using oprof_start"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="oprofile-gui"></a>3. Using <span class="command"><strong>oprof_start</strong></span></h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The <span class="command"><strong>oprof_start</strong></span> application provides a convenient way to start the profiler. | 
|  | Note that <span class="command"><strong>oprof_start</strong></span> is just a wrapper around the <span class="command"><strong>opcontrol</strong></span> script, | 
|  | so it does not provide more services than the script itself. | 
|  | </p> | 
|  | <p> | 
|  | After <span class="command"><strong>oprof_start</strong></span> is started you can select the event type for each counter; | 
|  | the sampling rate and other related parameters are explained in <a class="xref" href="#controlling-daemon" title="1. Using opcontrol">Section 1, “Using <span class="command"><strong>opcontrol</strong></span>”</a>. | 
|  | The "Configuration" section allows you to set general parameters such as the buffer size, kernel filename | 
|  | etc. The counter setup interface should be self-explanatory; <a class="xref" href="#hardware-counters" title="4.1. Hardware performance counters">Section 4.1, “Hardware performance counters”</a> and related | 
|  | links contain information on using unit masks. | 
|  | </p> | 
|  | <p> | 
|  | A status line shows the current status of the profiler: how long it has been running, and the average | 
|  | number of interrupts received per second and the total, over all processors. | 
|  | Note that quitting <span class="command"><strong>oprof_start</strong></span> does not stop the profiler. | 
|  | </p> | 
|  | <p> | 
|  | Your configuration is saved in the same file as <span class="command"><strong>opcontrol</strong></span> uses; that is, | 
|  | <code class="filename">~/.oprofile/daemonrc</code>. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect1" title="4. Configuration details"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="detailed-parameters"></a>4. Configuration details</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect2" title="4.1. Hardware performance counters"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="hardware-counters"></a>4.1. Hardware performance counters</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> | 
|  | <h3 class="title">Note</h3> | 
|  | <p> | 
|  | Your CPU type may not include the requisite support for hardware performance counters, in which case | 
|  | you must use OProfile in RTC mode in 2.4 (see <a class="xref" href="#rtc" title="4.2. OProfile in RTC mode">Section 4.2, “OProfile in RTC mode”</a>), or timer mode in 2.6 (see <a class="xref" href="#timer" title="4.3. OProfile in timer interrupt mode">Section 4.3, “OProfile in timer interrupt mode”</a>). | 
|  | You do not really need to read this section unless you are interested in using | 
|  | events other than the default event chosen by OProfile. | 
|  | </p> | 
|  | </div> | 
|  | <p> | 
|  | The Intel hardware performance counters are detailed in the Intel IA-32 Architecture Manual, Volume 3, available | 
|  | from <a class="ulink" href="http://developer.intel.com/">http://developer.intel.com/</a>. | 
|  | The AMD Athlon/Opteron/Phenom/Turion implementation is detailed in <a class="ulink" href="http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf"> | 
|  | http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf</a>. | 
|  | For PowerPC64 processors in IBM iSeries, pSeries, and blade server systems, processor documentation | 
|  | is available at <a class="ulink" href="http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC/"> | 
|  | http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC</a>.  (For example, the | 
|  | specific publication containing information on the performance monitor unit for the PowerPC970 is | 
|  | "IBM PowerPC 970FX RISC Microprocessor User's Manual.") | 
|  | These processors are capable of delivering an interrupt when a counter overflows. | 
|  | This is the basic mechanism on which OProfile is based. The delivery mode is <acronym class="acronym">NMI</acronym>, | 
|  | so blocking interrupts in the kernel does not prevent profiling. When the interrupt handler is called, | 
|  | the current <acronym class="acronym">PC</acronym> value and the current task are recorded into the profiling structure. | 
|  | This allows the overflow event to be attached to a specific assembly instruction in a binary image. | 
|  | The daemon receives this data from the kernel, and writes it to the sample files. | 
|  | </p> | 
|  | <p> | 
|  | If we use an event such as <code class="constant">CPU_CLK_UNHALTED</code> or <code class="constant">INST_RETIRED</code> | 
|  | (<code class="constant">GLOBAL_POWER_EVENTS</code> or <code class="constant">INSTR_RETIRED</code>, respectively, on the Pentium 4), we can | 
|  | use the overflow counts as an estimate of actual time spent in each part of code. Alternatively we can profile interesting | 
|  | data such as the cache behaviour of routines with the other available counters. | 
|  | </p> | 
|  | <p> | 
|  | However there are several caveats. First, there are those issues listed in the Intel manual. There is a delay | 
|  | between the counter overflow and the interrupt delivery that can skew results on a small scale - this means | 
|  | you cannot rely on the profiles at the instruction level as being perfectly accurate. | 
|  | If you are using an "event-mode" counter such as the cache counters, a count registered against it doesn't mean | 
|  | that it is responsible for that event. However, it implies that the counter overflowed in the dynamic | 
|  | vicinity of that instruction, to within a few instructions. Further details on this problem can be found in | 
|  | <a class="xref" href="#interpreting" title="Chapter 5. Interpreting profiling results">Chapter 5, <i>Interpreting profiling results</i></a> and also in the Digital paper "ProfileMe: A Hardware Performance Counter". | 
|  | </p> | 
|  | <p> | 
|  | Each counter has several configuration parameters. | 
|  | First, there is the unit mask: this simply further specifies what to count. | 
|  | Second, there is the counter value, discussed below. Third, there is a parameter whether to increment counts | 
|  | whilst in kernel or user space. You can configure these separately for each counter. | 
|  | </p> | 
|  | <p> | 
|  | After each overflow event, the counter will be re-initialized | 
|  | such that another overflow will occur after this many events have been counted. Thus, higher | 
|  | values mean less-detailed profiling, and lower values mean more detail, but higher overhead. | 
|  | Picking a good value for this | 
|  | parameter is, unfortunately, somewhat of a black art. It is of course dependent on the event | 
|  | you have chosen. | 
|  | Specifying too large a value will mean not enough interrupts are generated | 
|  | to give a realistic profile (though this problem can be ameliorated by profiling for <span class="emphasis"><em>longer</em></span>). | 
|  | Specifying too small a value can lead to higher performance overhead. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="4.2. OProfile in RTC mode"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="rtc"></a>4.2. OProfile in RTC mode</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> | 
|  | <h3 class="title">Note</h3> | 
|  | <p> | 
|  | This section applies to 2.2/2.4 kernels only. | 
|  | </p> | 
|  | </div> | 
|  | <p> | 
|  | Some CPU types do not provide the needed hardware support to use the hardware performance counters. This includes | 
|  | some laptops, classic Pentiums, and other CPU types not yet supported by OProfile (such as Cyrix). | 
|  | On these machines, OProfile falls | 
|  | back to using the real-time clock interrupt to collect samples. This interrupt is also used by the <span class="command"><strong>rtc</strong></span> | 
|  | module: you cannot have both the OProfile and rtc modules loaded nor the rtc support compiled in the kernel. | 
|  | </p> | 
|  | <p> | 
|  | RTC mode is less capable than the hardware counters mode; in particular, it is unable to profile sections of | 
|  | the kernel where interrupts are disabled. There is just one available event, "RTC interrupts", and its value | 
|  | corresponds to the number of interrupts generated per second (that is, a higher number means a better profiling | 
|  | resolution, and higher overhead). The current implementation of the real-time clock supports only power-of-two | 
|  | sampling rates from 2 to 4096 per second.  Other values within this range are rounded to the nearest power of | 
|  | two. | 
|  | </p> | 
|  | <p> | 
|  | You can force use of the RTC interrupt with the <code class="option">force_rtc=1</code> module parameter. | 
|  | </p> | 
|  | <p> | 
|  | Setting the value from the GUI should be straightforward. On the command line, you need to specify the | 
|  | event to <span class="command"><strong>opcontrol</strong></span>, e.g. : | 
|  | </p> | 
|  | <p> | 
|  | <span class="command"> | 
|  | <strong>opcontrol --event=RTC_INTERRUPTS:256</strong> | 
|  | </span> | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="4.3. OProfile in timer interrupt mode"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="timer"></a>4.3. OProfile in timer interrupt mode</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> | 
|  | <h3 class="title">Note</h3> | 
|  | <p> | 
|  | This section applies to 2.6 kernels and above only. | 
|  | </p> | 
|  | </div> | 
|  | <p> | 
|  | In 2.6 kernels on CPUs without OProfile support for the hardware performance counters, the driver | 
|  | falls back to using the timer interrupt for profiling. Like the RTC mode in 2.4 kernels, this is not able to | 
|  | profile code that has interrupts disabled. Note that there are no configuration parameters for | 
|  | setting this, unlike the RTC and hardware performance counter setup. | 
|  | </p> | 
|  | <p> | 
|  | You can force use of the timer interrupt by using the <code class="option">timer=1</code> module | 
|  | parameter (or <code class="option">oprofile.timer=1</code> on the boot command line if OProfile is | 
|  | built-in). | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="4.4. Pentium 4 support"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="p4"></a>4.4. Pentium 4 support</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The Pentium 4 / Xeon performance counters are organized around 3 types of model specific registers (MSRs): 45 event | 
|  | selection control registers (ESCRs), 18 counter configuration control registers (CCCRs) and 18 counters. ESCRs describe a | 
|  | particular set of events which are to be recorded, and CCCRs bind ESCRs to counters and configure their | 
|  | operation. Unfortunately the relationship between these registers is quite complex; they cannot all be used with one | 
|  | another at any time. There is, however, a subset of 8 counters, 8 ESCRs, and 8 CCCRs which can be used independently of | 
|  | one another, so OProfile only accesses those registers, treating them as a bank of 8 "normal" counters, similar | 
|  | to those in the P6 or Athlon/Opteron/Phenom/Turion families of CPU. | 
|  | </p> | 
|  | <p> | 
|  | There is currently no support for Precision Event-Based Sampling (PEBS), nor any advanced uses of the Debug Store | 
|  | (DS). Current support is limited to the conservative extension of OProfile's existing interrupt-based model described | 
|  | above.  Performance monitoring hardware on Pentium 4 / Xeon processors with Hyperthreading enabled (multiple logical | 
|  | processors on a single die) is not supported in 2.4 kernels (you can use OProfile if you disable hyper-threading, | 
|  | though). | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="4.5. Intel Itanium 2 support"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="ia64"></a>4.5. Intel Itanium 2 support</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The Itanium 2 performance monitoring unit (PMU) organizes the counters as four | 
|  | pairs of performance event monitoring registers. Each pair is composed of a | 
|  | Performance Monitoring Configuration (PMC) register and Performance Monitoring | 
|  | Data (PMD) register.  The PMC selects the performance event being monitored and | 
|  | the PMD determines the sampling interval. The IA64 Performance Monitoring Unit | 
|  | (PMU) triggers sampling with maskable interrupts. Thus, samples will not occur | 
|  | in sections of the IA64 kernel where interrupts are disabled. | 
|  | </p> | 
|  | <p> | 
|  | None of the advance features of the Itanium 2 performance monitoring unit | 
|  | such as opcode matching, address range matching, or precise event sampling are | 
|  | supported by this version of OProfile.  The Itanium 2 support only maps OProfile's | 
|  | existing interrupt-based model to the PMU hardware. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="4.6. PowerPC64 support"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="ppc64"></a>4.6. PowerPC64 support</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The performance monitoring unit (PMU) for the IBM PowerPC 64-bit processors | 
|  | consists of between 4 and 8 counters (depending on the model), plus three | 
|  | special purpose registers used for programming the counters -- MMCR0, MMCR1, | 
|  | and MMCRA.  Advanced features such as instruction matching and thresholding are | 
|  | not supported by this version of OProfile. | 
|  | </p> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>Later versions of the IBM POWER5+ processor (beginning with revision 3.0) | 
|  | run the performance monitor unit in POWER6 mode, effectively removing OProfile's | 
|  | access to counters 5 and 6.  These two counters are dedicated to counting | 
|  | instructions completed and cycles, respectively.  In POWER6 mode, however, the | 
|  | counters do not generate an interrupt on overflow and so are unusable by | 
|  | OProfile.  Kernel versions 2.6.23 and higher will recognize this mode | 
|  | and export "ppc64/power5++" as the cpu_type to the oprofilefs pseudo filesystem. | 
|  | OProfile userspace responds to this cpu_type by removing these counters from | 
|  | the list of potential events to count.  Without this kernel support, attempts | 
|  | to profile using an event from one of these counters will yield incorrect | 
|  | results -- typically, zero (or near zero) samples in the generated report. | 
|  | </div> | 
|  | <p> | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="4.7. Cell Broadband Engine support"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="cell-be"></a>4.7. Cell Broadband Engine support</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The Cell Broadband Engine (CBE) processor core consists of a PowerPC Processing | 
|  | Element (PPE) and 8 Synergistic Processing Elements (SPE).  PPEs and SPEs each | 
|  | consist of a processing unit (PPU and SPU, respectively) and other hardware | 
|  | components, such as memory controllers. | 
|  | </p> | 
|  | <p> | 
|  | A PPU has two hardware threads (aka "virtual CPUs").  The performance monitor | 
|  | unit of the CBE collects event information on one hardware thread at a time. | 
|  | Therefore, when profiling PPE events, | 
|  | OProfile collects the profile based on the selected events by time slicing the | 
|  | performance counter hardware between the two threads.   The user must ensure the | 
|  | collection interval is long enough so that the time spent collecting data for | 
|  | each PPU is sufficient to obtain a good profile. | 
|  | </p> | 
|  | <p> | 
|  | To profile an SPU application, the user should specify the SPU_CYCLES event. | 
|  | When starting OProfile with SPU_CYCLES, the opcontrol script enforces certain | 
|  | separation parameters (separate=cpu,lib) to ensure that sufficient information | 
|  | is collected in the sample data in order to generate a complete report.  The | 
|  | --merge=cpu option can be used to obtain a more readable report if analyzing | 
|  | the performance of each separate SPU is not necessary. | 
|  | </p> | 
|  | <p> | 
|  | Profiling with an SPU event (events 4100 through 4163) is not compatible with any other | 
|  | event.  Further more, only one SPU event can be specified at a time.  The hardware only | 
|  | supports profiling on one SPU per node at a time.  The OProfile kernel code time slices | 
|  | between the eight SPUs to collect data on all SPUs. | 
|  | </p> | 
|  | <p> | 
|  | SPU profile reports have some unique characteristics compared to reports for | 
|  | standard architectures: | 
|  | </p> | 
|  | <div class="itemizedlist"> | 
|  | <ul class="itemizedlist" type="disc"> | 
|  | <li class="listitem">Typically no "app name" column.  This is really standard OProfile behavior | 
|  | when the report contains samples for just a single application, which is | 
|  | commonly the case when profiling SPUs.</li> | 
|  | <li class="listitem">"CPU" equates to "SPU"</li> | 
|  | <li class="listitem">Specifying '--long-filenames' on the opreport command does not always result | 
|  | in long filenames.  This happens when the SPU application code is embedded in | 
|  | the PPE executable or shared library.  The embedded SPU ELF data contains only the | 
|  | short filename (i.e., no path information) for the SPU binary file that was used as | 
|  | the source for embedding.   The reason that just the short filename is used is because | 
|  | the original SPU binary file may not exist or be accessible at runtime.  The performance | 
|  | analyst must have sufficient knowledge of the application to be able to correlate the | 
|  | SPU binary image names found in the  report to the application's source files. | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3> | 
|  | Compile the application with -g and generate the OProfile report | 
|  | with -g to facilitate finding the right source file(s) on which to focus. | 
|  | </div></li> | 
|  | </ul> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect2" title="4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="amd-ibs-support"></a>4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Instruction-Based Sampling (IBS) is a new performance measurement technique | 
|  | available on AMD Family 10h processors. Traditional performance counter | 
|  | sampling is not precise enough to isolate performance issues to individual | 
|  | instructions. IBS, however, precisely identifies instructions which are not | 
|  | making the best use of the processor pipeline and memory hierarchy. | 
|  | For more information, please refer to the "Instruction-Based Sampling: | 
|  | A New Performance Analysis Technique for AMD Family 10h Processors" ( | 
|  | <a class="ulink" href="http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf"> | 
|  | http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf</a>). | 
|  | There are two types of IBS profile types, described in the following sections. | 
|  | </p> | 
|  | <div class="sect3" title="4.8.1. IBS Fetch"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h4 class="title"><a id="ibs-fetch"></a>4.8.1. IBS Fetch</h4> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | IBS fetch sampling is a statistical sampling method which counts completed | 
|  | fetch operations. When the number of completed fetch operations reaches the | 
|  | maximum fetch count (the sampling period), IBS tags the fetch operation and | 
|  | monitors that operation until it either completes or aborts. When a tagged | 
|  | fetch completes or aborts, a sampling interrupt is generated and an IBS fetch | 
|  | sample is taken. An IBS fetch sample contains a timestamp, the identifier of | 
|  | the interrupted process, the virtual fetch address, and several event flags | 
|  | and values that describe what happened during the fetch operation. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect3" title="4.8.2. IBS Op"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h4 class="title"><a id="ibs-op"></a>4.8.2. IBS Op</h4> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | IBS op sampling selects, tags, and monitors macro-ops as issued from AMD64 | 
|  | instructions. Two options are available for selecting ops for sampling: | 
|  | </p> | 
|  | <div class="itemizedlist"> | 
|  | <ul class="itemizedlist" type="disc"> | 
|  | <li class="listitem"> | 
|  | Cycles-based selection counts CPU clock cycles. The op is tagged and monitored | 
|  | when the count reaches a threshold (the sampling period) and a valid op is | 
|  | available. | 
|  | </li> | 
|  | <li class="listitem"> | 
|  | Dispatched op-based selection counts dispatched macro-ops. | 
|  | When the count reaches a threshold, the next valid op is tagged and monitored. | 
|  | </li> | 
|  | </ul> | 
|  | </div> | 
|  | <p> | 
|  | In both cases, an IBS sample is generated only if the tagged op retires. | 
|  | Thus, IBS op event information does not measure speculative execution activity. | 
|  | The execution stages of the pipeline monitor the tagged macro-op. When the | 
|  | tagged macro-op retires, a sampling interrupt is generated and an IBS op | 
|  | sample is taken. An IBS op sample contains a timestamp, the identifier of | 
|  | the interrupted process, the virtual address of the AMD64 instruction from | 
|  | which the op was issued, and several event flags and values that describe | 
|  | what happened when the macro-op executed. | 
|  | </p> | 
|  | </div> | 
|  | <p> | 
|  | Enabling IBS profiling is done simply by specifying IBS performance events | 
|  | through the "--event=" options. These events are listed in the | 
|  | <code class="function">opcontrol --list-events</code>. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | opcontrol --event=IBS_FETCH_XXX:<count>:<um>:<kernel>:<user> | 
|  | opcontrol --event=IBS_OP_XXX:<count>:<um>:<kernel>:<user> | 
|  |  | 
|  | Note: * All IBS fetch event must have the same event count and unitmask, | 
|  | as do those for IBS op. | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | </div> | 
|  | <div class="sect2" title="4.9. Dangerous counter settings"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="misuse"></a>4.9. Dangerous counter settings</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | OProfile is a low-level profiler which allow continuous profiling with a low-overhead cost. | 
|  | If too low a count reset value is set for a counter, the system can become overloaded with counter | 
|  | interrupts, and seem as if the system has frozen. Whilst some validation is done, it | 
|  | is not foolproof. | 
|  | </p> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> | 
|  | <h3 class="title">Note</h3> | 
|  | <p> | 
|  | This can happen as follows: When the profiler count | 
|  | reaches zero an NMI handler is called which stores the sample values in an internal buffer, then resets the counter | 
|  | to its original value. If the count is very low, a pending NMI can be sent before the NMI handler has | 
|  | completed. Due to the priority of the NMI, the local APIC delivers the pending interrupt immediately after | 
|  | completion of the previous interrupt handler, and control never returns to other parts of the system. | 
|  | In this way the system seems to be frozen. | 
|  | </p> | 
|  | </div> | 
|  | <p>If this happens, it will be impossible to bring the system back to a workable state. | 
|  | There is no way to provide real security against this happening, other than making sure to use a reasonable value | 
|  | for the counter reset. For example, setting <code class="constant">CPU_CLK_UNHALTED</code> event type with a ridiculously low reset count (e.g. 500) | 
|  | is likely to freeze the system. | 
|  | </p> | 
|  | <p> | 
|  | In short : <span class="command"><strong>Don't try a foolish sample count value</strong></span>. Unfortunately the definition of a foolish value | 
|  | is really dependent on the event type - if ever in doubt, e-mail </p> | 
|  | <div class="address"> | 
|  | <p><code class="email"><<a class="email" href="mailto:oprofile-list@lists.sf.net">oprofile-list@lists.sf.net</a>></code>.</p> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="chapter" title="Chapter 4. Obtaining results"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title"><a id="results"></a>Chapter 4. Obtaining results</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="toc"> | 
|  | <p> | 
|  | <b>Table of Contents</b> | 
|  | </p> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#profile-spec">1. Profile specifications</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#profile-spec-examples">1.1. Examples</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#profile-spec-details">1.2. Profile specification parameters</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#locating-and-managing-binary-images">1.3. Locating and managing binary images</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#no-results">1.4. What to do when you don't get any results</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#opreport">2. Image summaries and symbol summaries (<span class="command"><strong>opreport</strong></span>)</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-merging">2.1. Merging separate profiles</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-comparison">2.2. Side-by-side multiple results</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-callgraph">2.3. Callgraph output</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-diff">2.4. Differential profiles with <span class="command"><strong>opreport</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-anon">2.5. Anonymous executable mappings</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-xml">2.6. XML formatted output</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opreport-options">2.7. Options for <span class="command"><strong>opreport</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#opannotate">3. Outputting annotated source (<span class="command"><strong>opannotate</strong></span>)</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opannotate-finding-source">3.1. Locating source files</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opannotate-details">3.2. Usage of <span class="command"><strong>opannotate</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#getting-jit-reports">4. OProfile results with JIT samples</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#opgprof">5. <span class="command"><strong>gprof</strong></span>-compatible output (<span class="command"><strong>opgprof</strong></span>)</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opgprof-details">5.1. Usage of <span class="command"><strong>opgprof</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#oparchive">6. Archiving measurements (<span class="command"><strong>oparchive</strong></span>)</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#oparchive-details">6.1. Usage of <span class="command"><strong>oparchive</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#opimport">7. Converting sample database files (<span class="command"><strong>opimport</strong></span>)</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#opimport-details">7.1. Usage of <span class="command"><strong>opimport</strong></span></a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | <p> | 
|  | OK, so the profiler has been running, but it's not much use unless we can get some data out. Fairly often, | 
|  | OProfile does a little <span class="emphasis"><em>too</em></span> good a job of keeping overhead low, and no data reaches | 
|  | the profiler. This can happen on lightly-loaded machines. Remember you can force a dump at any time with : | 
|  | </p> | 
|  | <p> | 
|  | <span class="command"> | 
|  | <strong>opcontrol --dump</strong> | 
|  | </span> | 
|  | </p> | 
|  | <p>Remember to do this before complaining there is no profiling data ! | 
|  | Now that we've got some data, it has to be processed. That's the job of <span class="command"><strong>opreport</strong></span>, | 
|  | <span class="command"><strong>opannotate</strong></span>, or <span class="command"><strong>opgprof</strong></span>. | 
|  | </p> | 
|  | <div class="sect1" title="1. Profile specifications"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="profile-spec"></a>1. Profile specifications</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | All of the analysis tools take a <span class="emphasis"><em>profile specification</em></span>. | 
|  | This is a set of definitions that describe which actual profiles should be | 
|  | examined. The simplest profile specification is empty: this will match all | 
|  | the available profile files for the current session (this is what happens | 
|  | when you do <span class="command"><strong>opreport</strong></span>). | 
|  | </p> | 
|  | <p> | 
|  | Specification parameters are of the form <code class="option">name:value[,value]</code>. | 
|  | For example, if I wanted to get a combined symbol summary for | 
|  | <code class="filename">/bin/myprog</code> and <code class="filename">/bin/myprog2</code>, | 
|  | I could do <span class="command"><strong>opreport -l image:/bin/myprog,/bin/myprog2</strong></span>. | 
|  | As a special case, you don't actually need to specify the <code class="option">image:</code> | 
|  | part here: anything left on the command line is assumed to be an | 
|  | <code class="option">image:</code> name. Similarly, if no <code class="option">session:</code> | 
|  | is specified, then <code class="option">session:current</code> is assumed ("current" | 
|  | is a special name of the current / last profiling session). | 
|  | </p> | 
|  | <p> | 
|  | In addition to the comma-separated list shown above, some of the | 
|  | specification parameters can take <span class="command"><strong>glob</strong></span>-style | 
|  | values. For example, if I want to see image summaries for all | 
|  | binaries profiled in <code class="filename">/usr/bin/</code>, I could do | 
|  | <span class="command"><strong>opreport image:/usr/bin/\*</strong></span>. Note the necessity | 
|  | to escape the special character from the shell. | 
|  | </p> | 
|  | <p> | 
|  | For <span class="command"><strong>opreport</strong></span>, profile specifications can be used to | 
|  | define two profiles, giving differential output. This is done by | 
|  | enclosing each of the two specifications within curly braces, as shown | 
|  | in the examples below. Any specifications outside of curly braces are | 
|  | shared across both. | 
|  | </p> | 
|  | <div class="sect2" title="1.1. Examples"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="profile-spec-examples"></a>1.1. Examples</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Image summaries for all profiles with <code class="constant">DATA_MEM_REFS</code> | 
|  | samples in the saved session called "stresstest" : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # opreport session:stresstest event:DATA_MEM_REFS | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Symbol summary for the application called "test_sym53c8xx,9xx". Note the | 
|  | escaping is necessary as <code class="option">image:</code> takes a comma-separated list. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # opreport -l ./test/test_sym53c8xx\,9xx | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Image summaries for all binaries in the <code class="filename">test</code> directory, | 
|  | excepting <code class="filename">boring-test</code> : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # opreport image:./test/\* image-exclude:./test/boring-test | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Differential profile of a binary stored in two archives : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # opreport -l /bin/bash { archive:./orig } { archive:./new } | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Differential profile of an archived binary with the current session : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # opreport -l /bin/bash { archive:./orig } { } | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | </div> | 
|  | <div class="sect2" title="1.2. Profile specification parameters"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="profile-spec-details"></a>1.2. Profile specification parameters</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">archive:</code> | 
|  | <span class="emphasis"> | 
|  | <em>archivepath</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | A path to an archive made with <span class="command"><strong>oparchive</strong></span>. | 
|  | Absence of this tag, unlike others, means "the current system", | 
|  | equivalent to specifying "archive:". | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">session:</code> | 
|  | <span class="emphasis"> | 
|  | <em>sessionlist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | A comma-separated list of session names to resolve in. Absence of this | 
|  | tag, unlike others, means "the current session", equivalent to | 
|  | specifying "session:current". | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">session-exclude:</code> | 
|  | <span class="emphasis"> | 
|  | <em>sessionlist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | A comma-separated list of sessions to exclude. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">image:</code> | 
|  | <span class="emphasis"> | 
|  | <em>imagelist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | A comma-separated list of image names to resolve. Each entry may be relative | 
|  | path, <span class="command"><strong>glob</strong></span>-style name, or full path, e.g.</p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen">opreport 'image:/usr/bin/oprofiled,*op*,./opreport'</pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">image-exclude:</code> | 
|  | <span class="emphasis"> | 
|  | <em>imagelist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Same as <code class="option">image:</code>, but the matching images are excluded. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">lib-image:</code> | 
|  | <span class="emphasis"> | 
|  | <em>imagelist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Same as <code class="option">image:</code>, but only for images that are for | 
|  | a particular primary binary image (namely, an application). This only | 
|  | makes sense to use if you're using <code class="option">--separate</code>. | 
|  | This includes kernel modules and the kernel when using | 
|  | <code class="option">--separate=kernel</code>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">lib-image-exclude:</code> | 
|  | <span class="emphasis"> | 
|  | <em>imagelist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Same as <code class="option">lib-image:</code>, but the matching images | 
|  | are excluded. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">event:</code> | 
|  | <span class="emphasis"> | 
|  | <em>eventlist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | The symbolic event name to match on, e.g. <code class="option">event:DATA_MEM_REFS</code>. | 
|  | You can pass a list of events for side-by-side comparison with <span class="command"><strong>opreport</strong></span>. | 
|  | When using the timer interrupt, the event is always "TIMER". | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">count:</code> | 
|  | <span class="emphasis"> | 
|  | <em>eventcountlist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | The event count to match on, e.g. <code class="option">event:DATA_MEM_REFS count:30000</code>. | 
|  | Note that this value refers to the setting used for <span class="command"><strong>opcontrol</strong></span> | 
|  | only, and has nothing to do with the sample counts in the profile data | 
|  | itself. | 
|  | You can pass a list of events for side-by-side comparison with <span class="command"><strong>opreport</strong></span>. | 
|  | When using the timer interrupt, the count is always 0 (indicating it cannot be set). | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">unit-mask:</code> | 
|  | <span class="emphasis"> | 
|  | <em>masklist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | The unit mask value of the event to match on, e.g. <code class="option">unit-mask:1</code>. | 
|  | You can pass a list of events for side-by-side comparison with <span class="command"><strong>opreport</strong></span>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">cpu:</code> | 
|  | <span class="emphasis"> | 
|  | <em>cpulist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Only consider profiles for the given numbered CPU (starting from zero). | 
|  | This is only useful when using CPU profile separation. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">tgid:</code> | 
|  | <span class="emphasis"> | 
|  | <em>pidlist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Only consider profiles for the given task groups. Unless some program | 
|  | is using threads, the task group ID of a process is the same | 
|  | as its process ID. This option corresponds to the POSIX | 
|  | notion of a thread group. | 
|  | This is only useful when using per-process profile separation. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">tid:</code> | 
|  | <span class="emphasis"> | 
|  | <em>tidlist</em> | 
|  | </span> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Only consider profiles for the given threads. When using | 
|  | recent thread libraries, all threads in a process share the | 
|  | same task group ID, but have different thread IDs. You can | 
|  | use this option in combination with <code class="option">tgid:</code> to | 
|  | restrict the results to particular threads within a process. | 
|  | This is only useful when using per-process profile separation. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect2" title="1.3. Locating and managing binary images"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="locating-and-managing-binary-images"></a>1.3. Locating and managing binary images</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Each session's sample files can be found in the $SESSION_DIR/samples/ directory (default: <code class="filename">/var/lib/oprofile/samples/</code>). | 
|  | These are used, along with the binary image files, to produce human-readable data. | 
|  | In some circumstances (kernel modules in an initrd, or modules on 2.6 kernels), OProfile | 
|  | will not be able to find the binary images. All the tools have an <code class="option">--image-path</code> | 
|  | option to which you can pass a comma-separated list of alternate paths to search. For example, | 
|  | I can let OProfile find my 2.6 modules by using <span class="command"><strong>--image-path /lib/modules/2.6.0/kernel/</strong></span>. | 
|  | It is your responsibility to ensure that the correct images are found when using this | 
|  | option. | 
|  | </p> | 
|  | <p> | 
|  | Note that if a binary image changes after the sample file was created, you won't be able to get useful | 
|  | symbol-based data out. This situation is detected for you. If you replace a binary, you should | 
|  | make sure to save the old binary if you need to do comparative profiles. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="1.4. What to do when you don't get any results"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="no-results"></a>1.4. What to do when you don't get any results</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | When attempting to get output, you may see the error : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | error: no sample files found: profile specification too strict ? | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | What this is saying is that the profile specification you passed in, | 
|  | when matched against the available sample files, resulted in no matches. | 
|  | There are a number of reasons this might happen: | 
|  | </p> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term">spelling</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | You specified a binary name, but spelt it wrongly. Check your spelling ! | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">profiler wasn't running</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Make very sure that OProfile was actually up and running when you ran | 
|  | the binary. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">binary didn't run long enough</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Remember OProfile is a statistical profiler - you're not guaranteed to | 
|  | get samples for short-running programs. You can help this by using a | 
|  | lower count for the performance counter, so there are a lot more samples | 
|  | taken per second. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">binary spent most of its time in libraries</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Similarly, if the binary spends little time in the main binary image | 
|  | itself, with most of it spent in shared libraries it uses, you might | 
|  | not see any samples for the binary image itself. You can check this | 
|  | by using <span class="command"><strong>opcontrol --separate=lib</strong></span> before the | 
|  | profiling session, so <span class="command"><strong>opreport</strong></span> and friends show | 
|  | the library profiles on a per-application basis. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">specification was really too strict</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | For example, you specified something like <code class="option">tgid:3433</code>, | 
|  | but no task with that group ID ever ran the code. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">binary didn't generate any events</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | If you're using a particular event counter, for example counting MMX | 
|  | operations, the code might simply have not generated any events in the | 
|  | first place. Verify the code you're profiling does what you expect it | 
|  | to. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term">you didn't specify kernel module name correctly</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | If you're using 2.6 kernels, and trying to get reports for a kernel | 
|  | module, make sure to use the <code class="option">-p</code> option, and specify the | 
|  | module name <span class="emphasis"><em>with</em></span> the <code class="filename">.ko</code> | 
|  | extension. Check if the module is one loaded from initrd. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="2. Image summaries and symbol summaries (opreport)"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="opreport"></a>2. Image summaries and symbol summaries (<span class="command"><strong>opreport</strong></span>)</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The <span class="command"><strong>opreport</strong></span> utility is the primary utility you will use for | 
|  | getting formatted data out of OProfile. It produces two types of data: image summaries | 
|  | and symbol summaries. An image summary lists the number of samples for individual | 
|  | binary images such as libraries or applications. Symbol summaries provide per-symbol | 
|  | profile data. In the following example, we're getting an image summary for the whole | 
|  | system: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ opreport --long-filenames | 
|  | CPU: PIII, speed 863.195 MHz (estimated) | 
|  | Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 | 
|  | 905898 59.7415 /usr/lib/gcc-lib/i386-redhat-linux/3.2/cc1plus | 
|  | 214320 14.1338 /boot/2.6.0/vmlinux | 
|  | 103450  6.8222 /lib/i686/libc-2.3.2.so | 
|  | 60160  3.9674 /usr/local/bin/madplay | 
|  | 31769  2.0951 /usr/local/oprofile-pp/bin/oprofiled | 
|  | 26550  1.7509 /usr/lib/libartsflow.so.1.0.0 | 
|  | 23906  1.5765 /usr/bin/as | 
|  | 18770  1.2378 /oprofile | 
|  | 15528  1.0240 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5 | 
|  | 11979  0.7900 /usr/X11R6/bin/XFree86 | 
|  | 11328  0.7471 /bin/bash | 
|  | ... | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | If we had specified <code class="option">--symbols</code> in the previous command, we would have | 
|  | gotten a symbol summary of all the images across the entire system. We can restrict this to only | 
|  | part of the system profile; for example, | 
|  | below is a symbol summary of the OProfile daemon. Note that as we used | 
|  | <span class="command"><strong>opcontrol --separate=kernel</strong></span>, symbols from images that <span class="command"><strong>oprofiled</strong></span> | 
|  | has used are also shown. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ opreport -l `which oprofiled` 2>/dev/null | more | 
|  | CPU: PIII, speed 863.195 MHz (estimated) | 
|  | Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 | 
|  | vma      samples  %           image name               symbol name | 
|  | 0804be10 14971    28.1993     oprofiled                odb_insert | 
|  | 0804afdc 7144     13.4564     oprofiled                pop_buffer_value | 
|  | c01daea0 6113     11.5144     vmlinux                  __copy_to_user_ll | 
|  | 0804b060 2816      5.3042     oprofiled                opd_put_sample | 
|  | 0804b4a0 2147      4.0441     oprofiled                opd_process_samples | 
|  | 0804acf4 1855      3.4941     oprofiled                opd_put_image_sample | 
|  | 0804ad84 1766      3.3264     oprofiled                opd_find_image | 
|  | 0804a5ec 1084      2.0418     oprofiled                opd_find_module | 
|  | 0804ba5c 741       1.3957     oprofiled                odb_hash_add_node | 
|  | ... | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | These are the two basic ways you are most likely to use regularly, but <span class="command"><strong>opreport</strong></span> | 
|  | can do a lot more than that, as described below. | 
|  | </p> | 
|  | <div class="sect2" title="2.1. Merging separate profiles"><div class="titlepage"><div><div><h3 class="title"><a id="opreport-merging"></a>2.1. Merging separate profiles</h3></div></div></div> | 
|  |  | 
|  | If you have used one of the <code class="option">--separate=</code> options | 
|  | whilst profiling, there can be several separate profiles for | 
|  | a single binary image within a session. Normally the output | 
|  | will keep these images separated (so, for example, the image summary | 
|  | output shows library image summaries on a per-application basis, | 
|  | when using <code class="option">--separate=lib</code>). | 
|  | Sometimes it can be useful to merge these results back together | 
|  | before getting results. The <code class="option">--merge</code> option allows | 
|  | you to do that. | 
|  | </div> | 
|  | <div class="sect2" title="2.2. Side-by-side multiple results"><div class="titlepage"><div><div><h3 class="title"><a id="opreport-comparison"></a>2.2. Side-by-side multiple results</h3></div></div></div> | 
|  | If you have used multiple events when profiling, by default you get | 
|  | side-by-side results of each event's sample values from <span class="command"><strong>opreport</strong></span>. | 
|  | You can restrict which events to list by appropriate use of the | 
|  | <code class="option">event:</code> profile specifications, etc. | 
|  | </div> | 
|  | <div class="sect2" title="2.3. Callgraph output"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="opreport-callgraph"></a>2.3. Callgraph output</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | This section provides details on how to use the OProfile callgraph feature. | 
|  | </p> | 
|  | <div class="sect3" title="2.3.1. Callgraph details"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h4 class="title"><a id="op-cg1"></a>2.3.1. Callgraph details</h4> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | When using the <code class="option">opcontrol --callgraph</code> option, you can see what | 
|  | functions are calling other functions in the output. Consider the | 
|  | following program: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | #include <string.h> | 
|  | #include <stdlib.h> | 
|  | #include <stdio.h> | 
|  |  | 
|  | #define SIZE 500000 | 
|  |  | 
|  | static int compare(const void *s1, const void *s2) | 
|  | { | 
|  | return strcmp(s1, s2); | 
|  | } | 
|  |  | 
|  | static void repeat(void) | 
|  | { | 
|  | int i; | 
|  | char *strings[SIZE]; | 
|  | char str[] = "abcdefghijklmnopqrstuvwxyz"; | 
|  |  | 
|  | for (i = 0; i < SIZE; ++i) { | 
|  | strings[i] = strdup(str); | 
|  | strfry(strings[i]); | 
|  | } | 
|  |  | 
|  | qsort(strings, SIZE, sizeof(char *), compare); | 
|  | } | 
|  |  | 
|  | int main() | 
|  | { | 
|  | while (1) | 
|  | repeat(); | 
|  | } | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | When running with the call-graph option, OProfile will | 
|  | record the function stack every time it takes a sample. | 
|  | <span class="command"><strong>opreport --callgraph</strong></span> outputs an entry for each | 
|  | function, where each entry looks similar to: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | samples  %        image name               symbol name | 
|  | 197       0.1548  cg                       main | 
|  | 127036   99.8452  cg                       repeat | 
|  | 84590    42.5084  libc-2.3.2.so            strfry | 
|  | 84590    66.4838  libc-2.3.2.so            strfry [self] | 
|  | 39169    30.7850  libc-2.3.2.so            random_r | 
|  | 3475      2.7312  libc-2.3.2.so            __i686.get_pc_thunk.bx | 
|  | ------------------------------------------------------------------------------- | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Here the non-indented line is the function we're focussing upon | 
|  | (<code class="function">strfry()</code>). This | 
|  | line is the same as you'd get from a normal <span class="command"><strong>opreport</strong></span> | 
|  | output. | 
|  | </p> | 
|  | <p> | 
|  | Above the non-indented line we find the functions that called this | 
|  | function (for example, <code class="function">repeat()</code> calls | 
|  | <code class="function">strfry()</code>). The samples and percentage values here | 
|  | refer to the number of times we took a sample where this call was found | 
|  | in the stack; the percentage is relative to all other callers of the | 
|  | function we're focussing on. Note that these values are | 
|  | <span class="emphasis"><em>not</em></span> call counts; they only reflect the call stack | 
|  | every time a sample is taken; that is, if a call is found in the stack | 
|  | at the time of a sample, it is recorded in this count. | 
|  | </p> | 
|  | <p> | 
|  | Below the line are functions that are called by | 
|  | <code class="function">strfry()</code> (called <span class="emphasis"><em>callees</em></span>). | 
|  | It's clear here that <code class="function">strfry()</code> calls | 
|  | <code class="function">random_r()</code>. We also see a special entry with a | 
|  | "[self]" marker. This records the normal samples for the function, but | 
|  | the percentage becomes relative to all callees. This allows you to | 
|  | compare time spent in the function itself compared to functions it | 
|  | calls. Note that if a function calls itself, then it will appear in the | 
|  | list of callees of itself, but without the "[self]" marker; so recursive | 
|  | calls are still clearly separable. | 
|  | </p> | 
|  | <p> | 
|  | You may have noticed that the output lists <code class="function">main()</code> | 
|  | as calling <code class="function">strfry()</code>, but it's clear from the source | 
|  | that this doesn't actually happen. See <a class="xref" href="#interpreting-callgraph" title="3. Interpreting call-graph profiles">Section 3, “Interpreting call-graph profiles”</a> for an explanation. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect3" title="2.3.2. Callgraph and JIT support"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h4 class="title"><a id="cg-with-jitsupport"></a>2.3.2. Callgraph and JIT support</h4> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Callgraph output where anonymously mapped code is in the callstack can sometimes be misleading. | 
|  | For all such code, the samples for the anonymously mapped code are stored in a samples subdirectory | 
|  | named <code class="filename">{anon:anon}/<tgid>.<begin_addr>.<end_addr></code>. | 
|  | As stated earlier, if this anonymously mapped code is JITed code from a supported VM like Java, | 
|  | OProfile creates an ELF file to provide a (somewhat) permanent backing file for the code. | 
|  | However, when viewing callgraph output, any anonymously mapped code in the callstack | 
|  | will be attributed to <code class="filename">anon (<tgid>: range:<begin_addr>-<end_addr></code>, | 
|  | even if a <code class="filename">.jo</code> ELF file had been created for it.  See the example below. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | ------------------------------------------------------------------------------- | 
|  | 1         2.2727  libj9ute23.so            java.bin                 traceV | 
|  | 2         4.5455  libj9ute23.so            java.bin                 utsTraceV | 
|  | 4         9.0909  libj9trc23.so            java.bin                 fillInUTInterfaces | 
|  | 37       84.0909  libj9trc23.so            java.bin                 twGetSequenceCounter | 
|  | 8         0.0154  libj9prt23.so            java.bin                 j9time_hires_clock | 
|  | 27       61.3636  anon (tgid:10014 range:0x100000-0x103000) java.bin                 (no symbols) | 
|  | 9        20.4545  libc-2.4.so              java.bin                 gettimeofday | 
|  | 8        18.1818  libj9prt23.so            java.bin                 j9time_hires_clock [self] | 
|  | ------------------------------------------------------------------------------- | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | The output shows that "anon (tgid:10014 range:0x100000-0x103000)" was a callee of | 
|  | <code class="code">j9time_hires_clock</code>, even though the ELF file <code class="filename">10014.jo</code> was | 
|  | created for this profile run.  Unfortunately, there is currently no way to correlate | 
|  | that anonymous callgraph entry with its corresponding <code class="filename">.jo</code> file. | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect2" title="2.4. Differential profiles with opreport"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="opreport-diff"></a>2.4. Differential profiles with <span class="command"><strong>opreport</strong></span></h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Often, we'd like to be able to compare two profiles. For example, when | 
|  | analysing the performance of an application, we'd like to make code | 
|  | changes and examine the effect of the change. This is supported in | 
|  | <span class="command"><strong>opreport</strong></span> by giving a profile specification that | 
|  | identifies two different profiles. The general form is of: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ opreport <shared-spec> { <first-profile> } { <second-profile> } | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> | 
|  | <h3 class="title">Note</h3> | 
|  | <p> | 
|  | We lost our Dragon book down the back of the sofa, so you have to be | 
|  | careful to have spaces around those braces, or things will get | 
|  | hopelessly confused. We can only apologise. | 
|  | </p> | 
|  | </div> | 
|  | <p> | 
|  | For each of the profiles, the shared section is prefixed, and then the | 
|  | specification is analysed. The usual parameters work both within the | 
|  | shared section, and in the sub-specification within the curly braces. | 
|  | </p> | 
|  | <p> | 
|  | A typical way to use this feature is with archives created with | 
|  | <span class="command"><strong>oparchive</strong></span>. Let's look at an example: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ ./a | 
|  | $ oparchive -o orig ./a | 
|  | $ opcontrol --reset | 
|  | # edit and recompile a | 
|  | $ ./a | 
|  | # now compare the current profile of a with the archived profile | 
|  | $ opreport -xl ./a { archive:./orig } { } | 
|  | CPU: PIII, speed 863.233 MHz (estimated) | 
|  | Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a | 
|  | unit mask of 0x00 (No unit mask) count 100000 | 
|  | samples  %        diff %    symbol name | 
|  | 92435    48.5366  +0.4999   a | 
|  | 54226    ---      ---       c | 
|  | 49222    25.8459  +++       d | 
|  | 48787    25.6175  -2.2e-01  b | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Note that we specified an empty second profile in the curly braces, as | 
|  | we wanted to use the current session; alternatively, we could | 
|  | have specified another archive, or a tgid etc. We specified the binary | 
|  | <span class="command"><strong>a</strong></span> in the shared section, so we matched that in both | 
|  | the profiles we're diffing. | 
|  | </p> | 
|  | <p> | 
|  | As in the normal output, the results are sorted by the number of | 
|  | samples, and the percentage field represents the relative percentage of | 
|  | the symbol's samples in the second profile. | 
|  | </p> | 
|  | <p> | 
|  | Notice the new column in the output. This value represents the | 
|  | percentage change of the relative percent between the first and the | 
|  | second profile: roughly, "how much more important this symbol is". | 
|  | Looking at the symbol <code class="function">a()</code>, we can see that it took | 
|  | roughly the same amount of the total profile in both the first and the | 
|  | second profile. The function <code class="function">c()</code> was not in the new | 
|  | profile, so has been marked with <code class="function">---</code>. Note that the | 
|  | sample value is the number of samples in the first profile; since we're | 
|  | displaying results for the second profile, we don't list a percentage | 
|  | value for it, as it would be meaningless. <code class="function">d()</code> is | 
|  | new in the second profile, and consequently marked with | 
|  | <code class="function">+++</code>. | 
|  | </p> | 
|  | <p> | 
|  | When comparing profiles between different binaries, it should be clear | 
|  | that functions can change in terms of VMA and size. To avoid this | 
|  | problem, <span class="command"><strong>opreport</strong></span> considers a symbol to be the same | 
|  | if the symbol name, image name, and owning application name all match; | 
|  | any other factors are ignored. Note that the check for application name | 
|  | means that trying to compare library profiles between two different | 
|  | applications will not work as you might expect: each symbol will be | 
|  | considered different. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="2.5. Anonymous executable mappings"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="opreport-anon"></a>2.5. Anonymous executable mappings</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Many applications, typically ones involving dynamic compilation into | 
|  | machine code (just-in-time, or "JIT", compilation), have executable mappings that | 
|  | are not backed by an ELF file. <span class="command"><strong>opreport</strong></span> has basic support for showing the | 
|  | samples taken in these regions; for example: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ opreport /usr/bin/mono -l | 
|  | CPU: ppc64 POWER5, speed 1654.34 MHz (estimated) | 
|  | Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000 | 
|  | samples  %        image name    		                symbol name | 
|  | 47       58.7500  mono                     			(no symbols) | 
|  | 14       17.5000  anon (tgid:3189 range:0xf72aa000-0xf72fa000)  (no symbols) | 
|  | 9        11.2500  anon (tgid:3189 range:0xf6cca000-0xf6dd9000)  (no symbols) | 
|  | .	 .	  .						. | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | </p> | 
|  | <p> | 
|  | Note that, since such mappings are dependent upon individual invocations of | 
|  | a binary, these mappings are always listed as a dependent image, | 
|  | even when using <code class="option">--separate=none</code>. | 
|  | Equally, the results are not affected by the <code class="option">--merge</code> | 
|  | option. | 
|  | </p> | 
|  | <p> | 
|  | As shown in the opreport output above, OProfile is unable to attribute the samples to any | 
|  | symbol(s) because there is no ELF file for this code. | 
|  | Enhanced support for JITed code is now available for some virtual machines; | 
|  | e.g., the Java Virtual Machine.  For details about OProfile output for | 
|  | JITed code, see <a class="xref" href="#getting-jit-reports" title="4. OProfile results with JIT samples">Section 4, “OProfile results with JIT samples”</a>. | 
|  | </p> | 
|  | <p>For more information about JIT support in OProfile, see <a class="xref" href="#jitsupport" title="1.1. Support for dynamically compiled (JIT) code">Section 1.1, “Support for dynamically compiled (JIT) code”</a>. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="2.6. XML formatted output"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="opreport-xml"></a>2.6. XML formatted output</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The -xml option can be used to generate XML instead of the usual | 
|  | text format.  This allows opreport to eliminate some of the constraints | 
|  | dictated by the two dimensional text format.  For example, it is possible | 
|  | to separate the sample data across multiple events, cpus and threads.  The XML | 
|  | schema implemented by opreport is found in doc/opreport.xsd. It contains | 
|  | more detailed comments about the structure of the XML generated by opreport. | 
|  | </p> | 
|  | <p> | 
|  | Since XML is consumed by a client program rather than a user, its structure | 
|  | is fairly static.  In particular, the --sort option is incompatible with the | 
|  | --xml option.  Percentages are not dislayed in the XML so the options related | 
|  | to percentages will have no effect.  Full pathnames are always displayed in | 
|  | the XML so --long-filenames is not necessary.  The --details option will cause | 
|  | all of the individual sample data to be included in the XML as well as the | 
|  | instruction byte stream for each symbol (for doing disassembly) and can result | 
|  | in very large XML files. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="2.7. Options for opreport"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="opreport-options"></a>2.7. Options for <span class="command"><strong>opreport</strong></span></h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--accumulated / -a</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Accumulate sample and percentage counts in the symbol list. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--callgraph / -c</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show callgraph information. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--debug-info / -g</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show source file and line for each symbol. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--demangle / -D none|normal|smart</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | none: no demangling. normal: use default demangler (default) smart: use | 
|  | pattern-matching to make C++ symbol demangling more readable. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--details / -d</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show per-instruction details for all selected symbols. Note that, for | 
|  | binaries without symbol information, the VMA values shown are raw file | 
|  | offsets for the image binary. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--exclude-dependent / -x</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Do not include application-specific images for libraries, kernel modules | 
|  | and the kernel. This option only makes sense if the profile session | 
|  | used --separate. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--exclude-symbols / -e [symbols]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Exclude all the symbols in the given comma-separated list. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--global-percent / -%</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Make all percentages relative to the whole profile. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--help / -? / --usage</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show help message. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--image-path / -p [paths]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Comma-separated list of additional paths to search for binaries. | 
|  | This is needed to find modules in kernels 2.6 and upwards. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--root / -R [path]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | A path to a filesystem to search for additional binaries. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--include-symbols / -i [symbols]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Only include symbols in the given comma-separated list. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--long-filenames / -f</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Output full paths instead of basenames. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--merge / -m [lib,cpu,tid,tgid,unitmask,all]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Merge any profiles separated in a --separate session. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--no-header</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Don't output a header detailing profiling parameters. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--output-file / -o [file]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Output to the given file instead of stdout. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--reverse-sort / -r</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Reverse the sort from the default. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"><code class="option">--session-dir=</code>dir_path</span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Use sample database out of directory <code class="filename">dir_path</code> | 
|  | instead of the default location (/var/lib/oprofile). | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--show-address / -w</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show the VMA address of each symbol (off by default). | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--sort / -s [vma,sample,symbol,debug,image]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Sort the list of symbols by, respectively, symbol address, | 
|  | number of samples, symbol name, debug filename and line number, | 
|  | binary image filename. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--symbols / -l</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | List per-symbol information instead of a binary image summary. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--threshold / -t [percentage]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Only output data for symbols that have more than the given percentage | 
|  | of total samples. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--verbose / -V [options]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Give verbose debugging output. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--version / -v</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show version. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--xml / -X</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Generate XML output. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="3. Outputting annotated source (opannotate)"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="opannotate"></a>3. Outputting annotated source (<span class="command"><strong>opannotate</strong></span>)</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The <span class="command"><strong>opannotate</strong></span> utility generates annotated source files or assembly listings, optionally | 
|  | mixed with source. | 
|  | If you want to see the source file, the profiled application needs to have debug information, and the source | 
|  | must be available through this debug information. For GCC, you must use the <code class="option">-g</code> option | 
|  | when you are compiling. | 
|  | If the binary doesn't contain sufficient debug information, you can still | 
|  | use <span class="command"><strong>opannotate <code class="option">--assembly</code></strong></span> to get annotated assembly. | 
|  | </p> | 
|  | <p> | 
|  | Note that for the reason explained in <a class="xref" href="#hardware-counters" title="4.1. Hardware performance counters">Section 4.1, “Hardware performance counters”</a> the results can be | 
|  | inaccurate. The debug information itself can add other problems; for example, the line number for a symbol can be | 
|  | incorrect. Assembly instructions can be re-ordered and moved by the compiler, and this can lead to | 
|  | crediting source lines with samples not really "owned" by this line. Also see | 
|  | <a class="xref" href="#interpreting" title="Chapter 5. Interpreting profiling results">Chapter 5, <i>Interpreting profiling results</i></a>. | 
|  | </p> | 
|  | <p> | 
|  | You can output the annotation to one single file, containing all the source found using the | 
|  | <code class="option">--source</code>. You can use this in conjunction with <code class="option">--assembly</code> | 
|  | to get combined source/assembly output. | 
|  | </p> | 
|  | <p> | 
|  | You can also output a directory of annotated source files that maintains the structure of | 
|  | the original sources. Each line in the annotated source is prepended with the samples | 
|  | for that line. Additionally, each symbol is annotated giving details for the symbol | 
|  | as a whole. An example: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ opannotate --source --output-dir=annotated /usr/local/oprofile-pp/bin/oprofiled | 
|  | $ ls annotated/home/moz/src/oprofile-pp/daemon/ | 
|  | opd_cookie.h  opd_image.c  opd_kernel.c  opd_sample_files.c  oprofiled.c | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Line numbers are maintained in the source files, but each file has | 
|  | a footer appended describing the profiling details. The actual annotation | 
|  | looks something like this : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | ... | 
|  | :static uint64_t pop_buffer_value(struct transient * trans) | 
|  | 11510  1.9661 :{ /* pop_buffer_value total:  89901 15.3566 */ | 
|  | :        uint64_t val; | 
|  | : | 
|  | 10227  1.7469 :        if (!trans->remaining) { | 
|  | :                fprintf(stderr, "BUG: popping empty buffer !\n"); | 
|  | :                exit(EXIT_FAILURE); | 
|  | :        } | 
|  | : | 
|  | :        val = get_buffer_value(trans->buffer, 0); | 
|  | 2281  0.3896 :        trans->remaining--; | 
|  | 2296  0.3922 :        trans->buffer += kernel_pointer_size; | 
|  | :        return val; | 
|  | 10454  1.7857 :} | 
|  | ... | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | The first number on each line is the number of samples, whilst the second is | 
|  | the relative percentage of total samples. | 
|  | </p> | 
|  | <div class="sect2" title="3.1. Locating source files"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="opannotate-finding-source"></a>3.1. Locating source files</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Of course, <span class="command"><strong>opannotate</strong></span> needs to be able to locate the source files | 
|  | for the binary image(s) in order to produce output. Some binary images have debug | 
|  | information where the given source file paths are relative, not absolute. You can | 
|  | specify search paths to look for these files (similar to <span class="command"><strong>gdb</strong></span>'s | 
|  | <code class="option">dir</code> command) with the <code class="option">--search-dirs</code> option. | 
|  | </p> | 
|  | <p> | 
|  | Sometimes you may have a binary image which gives absolute paths for the source files, | 
|  | but you have the actual sources elsewhere (commonly, you've installed an SRPM for | 
|  | a binary on your system and you want annotation from an existing profile). You can | 
|  | use the <code class="option">--base-dirs</code> option to redirect OProfile to look somewhere | 
|  | else for source files. For example, imagine we have a binary generated from a source | 
|  | file that is given in the debug information as <code class="filename">/tmp/build/libfoo/foo.c</code>, | 
|  | and you have the source tree matching that binary installed in <code class="filename">/home/user/libfoo/</code>. | 
|  | You can redirect OProfile to find <code class="filename">foo.c</code> correctly like this : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ opannotate --source --base-dirs=/tmp/build/libfoo/ --search-dirs=/home/user/libfoo/ --output-dir=annotated/ /lib/libfoo.so | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | You can specify multiple (comma-separated) paths to both options. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="3.2. Usage of opannotate"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="opannotate-details"></a>3.2. Usage of <span class="command"><strong>opannotate</strong></span></h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--assembly / -a</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Output annotated assembly. If this is combined with --source, then mixed | 
|  | source / assembly annotations are output. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--base-dirs / -b [paths]/</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Comma-separated list of path prefixes. This can be used to point OProfile to a | 
|  | different location for source files when the debug information specifies an | 
|  | absolute path on your system for the source that does not exist. The prefix | 
|  | is stripped from the debug source file paths, then searched in the search dirs | 
|  | specified by <code class="option">--search-dirs</code>. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--demangle / -D none|normal|smart</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | none: no demangling. normal: use default demangler (default) smart: use | 
|  | pattern-matching to make C++ symbol demangling more readable. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--exclude-dependent / -x</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Do not include application-specific images for libraries, kernel modules | 
|  | and the kernel. This option only makes sense if the profile session | 
|  | used --separate. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--exclude-file [files]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Exclude all files in the given comma-separated list of glob patterns. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--exclude-symbols / -e [symbols]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Exclude all the symbols in the given comma-separated list. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--help / -? / --usage</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show help message. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--image-path / -p [paths]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Comma-separated list of additional paths to search for binaries. | 
|  | This is needed to find modules in kernels 2.6 and upwards. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--root / -R [path]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | A path to a filesystem to search for additional binaries. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--include-file [files]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Only include files in the given comma-separated list of glob patterns. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--include-symbols / -i [symbols]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Only include symbols in the given comma-separated list. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--objdump-params [params]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Pass the given parameters as extra values when calling objdump. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--output-dir / -o [dir]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Output directory. This makes opannotate output one annotated file for each | 
|  | source file. This option can't be used in conjunction with --assembly. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--search-dirs / -d [paths]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Comma-separated list of paths to search for source files. This is useful to find | 
|  | source files when the debug information only contains relative paths. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--source / -s</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Output annotated source. This requires debugging information to be available | 
|  | for the binaries. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--threshold / -t [percentage]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Only output data for symbols that have more than the given percentage | 
|  | of total samples. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--verbose / -V [options]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Give verbose debugging output. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--version / -v</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show version. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="4. OProfile results with JIT samples"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="getting-jit-reports"></a>4. OProfile results with JIT samples</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | After profiling a Java (or other supported VM) application, the command | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"><span xmlns="http://www.w3.org/1999/xhtml" class="command"><strong>"opcontrol --dump"</strong></span> </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | flushes the sample buffers and creates ELF binaries from the | 
|  | intermediate files that were written by the agent library. | 
|  | The ELF binaries are named <code class="filename"><tgid>.jo</code>. | 
|  | With the symbol information stored in these ELF files, it is | 
|  | possible to map samples to the appropriate symbols. | 
|  | </p> | 
|  | <p> | 
|  | The usual analysis tools (<span class="command"><strong>opreport</strong></span> and/or | 
|  | <span class="command"><strong>opannotate</strong></span>) can now be used | 
|  | to get symbols and assembly code for the instrumented VM processes. | 
|  | </p> | 
|  | <p> | 
|  | Below is an example of a profile report of a Java application that has been | 
|  | instrumented with the provided agent library. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ opreport -l /usr/lib/jvm/jre-1.5.0-ibm/bin/java | 
|  | CPU: Core Solo / Duo, speed 2167 MHz (estimated) | 
|  | Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 | 
|  | samples  %        image name               symbol name | 
|  | 186020   50.0523  no-vmlinux               no-vmlinux               (no symbols) | 
|  | 34333     9.2380  7635.jo                  java                     void test.f1() | 
|  | 19022     5.1182  libc-2.5.so              libc-2.5.so              _IO_file_xsputn@@GLIBC_2.1 | 
|  | 18762     5.0483  libc-2.5.so              libc-2.5.so              vfprintf | 
|  | 16408     4.4149  7635.jo                  java                     void test$HelloThread.run() | 
|  | 16250     4.3724  7635.jo                  java                     void test$test_1.f2(int) | 
|  | 15303     4.1176  7635.jo                  java                     void test.f2(int, int) | 
|  | 13252     3.5657  7635.jo                  java                     void test.f2(int) | 
|  | 5165      1.3897  7635.jo                  java                     void test.f4() | 
|  | 955       0.2570  7635.jo                  java                     void test$HelloThread.run()~ | 
|  |  | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | </p> | 
|  | <div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"> | 
|  | <h3 class="title">Note</h3> | 
|  | <p> | 
|  | Depending on the JVM that is used, certain options of opreport and opannotate | 
|  | do NOT work since they rely on debug information (e.g. source code line number) | 
|  | that is not always available. The Sun JVM does provide the necessary debug | 
|  | information via the JVMTI[PI] interface, | 
|  | but other JVMs do not. | 
|  | </p> | 
|  | </div> | 
|  | <p> | 
|  | As you can see in the opreport output, the JIT support agent for Java | 
|  | generates symbols to include the class and method signature. | 
|  | A symbol with the suffix ˜<n> (e.g. | 
|  | <code class="code">void test$HelloThread.run()˜1</code>) means that this is | 
|  | the <n>th occurrence of the identical name. This happens if a method is re-JITed. | 
|  | A symbol with the suffix %<n>, means that the address space of this symbol | 
|  | was reused during the sample session (see <a class="xref" href="#overlapping-symbols" title="6. Overlapping symbols in JITed code">Section 6, “Overlapping symbols in JITed code”</a>). | 
|  | The value <n> is the percentage of time that this symbol/code was present in | 
|  | relation to the total lifetime of all overlapping other symbols. A symbol of the form | 
|  | <code class="code"><return_val> <class_name>$<method_sig></code> denotes an | 
|  | inner class. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect1" title="5. gprof-compatible output (opgprof)"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="opgprof"></a>5. <span class="command"><strong>gprof</strong></span>-compatible output (<span class="command"><strong>opgprof</strong></span>)</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | If you're familiar with the output produced by <span class="command"><strong>GNU gprof</strong></span>, | 
|  | you may find <span class="command"><strong>opgprof</strong></span> useful. It takes a single binary | 
|  | as an argument, and produces a <code class="filename">gmon.out</code> file for use | 
|  | with <span class="command"><strong>gprof -p</strong></span>. If call-graph profiling is enabled, | 
|  | then this is also included. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ opgprof `which oprofiled` # generates gmon.out file | 
|  | $ gprof -p `which oprofiled` | head | 
|  | Flat profile: | 
|  |  | 
|  | Each sample counts as 1 samples. | 
|  | %   cumulative   self              self     total | 
|  | time   samples   samples    calls  T1/call  T1/call  name | 
|  | 33.13 206237.00 206237.00                             odb_insert | 
|  | 22.67 347386.00 141149.00                             pop_buffer_value | 
|  | 9.56 406881.00 59495.00                             opd_put_sample | 
|  | 7.34 452599.00 45718.00                             opd_find_image | 
|  | 7.19 497327.00 44728.00                             opd_process_samples | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <div class="sect2" title="5.1. Usage of opgprof"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="opgprof-details"></a>5.1. Usage of <span class="command"><strong>opgprof</strong></span></h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--help / -? / --usage</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show help message. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--image-path / -p [paths]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Comma-separated list of additional paths to search for binaries. | 
|  | This is needed to find modules in kernels 2.6 and upwards. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--root / -R [path]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | A path to a filesystem to search for additional binaries. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--output-filename / -o [file]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Output to the given file instead of the default, gmon.out | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--threshold / -t [percentage]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Only output data for symbols that have more than the given percentage | 
|  | of total samples. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--verbose / -V [options]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Give verbose debugging output. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--version / -v</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show version. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="6. Archiving measurements (oparchive)"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="oparchive"></a>6. Archiving measurements (<span class="command"><strong>oparchive</strong></span>)</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The <span class="command"><strong>oparchive</strong></span> utility generates a directory populated | 
|  | with executable, debug, and oprofile sample files. This directory can be | 
|  | moved to another machine via <span class="command"><strong>tar</strong></span> and analyzed without | 
|  | further use of the data collection machine. | 
|  | </p> | 
|  | <p> | 
|  | The following command would collect the sample files, the executables | 
|  | associated with the sample files, and the debuginfo files associated | 
|  | with the executables and copy them into | 
|  | <code class="filename">/tmp/current_data</code>: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # oparchive -o /tmp/current_data | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <div class="sect2" title="6.1. Usage of oparchive"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="oparchive-details"></a>6.1. Usage of <span class="command"><strong>oparchive</strong></span></h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--help / -? / --usage</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show help message. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--exclude-dependent / -x</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Do not include application-specific images for libraries, kernel modules | 
|  | and the kernel. This option only makes sense if the profile session | 
|  | used --separate. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--image-path / -p [paths]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Comma-separated list of additional paths to search for binaries. | 
|  | This is needed to find modules in kernels 2.6 and upwards. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--root / -R [path]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | A path to a filesystem to search for additional binaries. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--output-directory / -o [directory]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Output to the given directory. There is no default. This must be specified. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--list-files / -l</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Only list the files that would be archived, don't copy them. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--verbose / -V [options]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Give verbose debugging output. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--version / -v</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show version. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="7. Converting sample database files (opimport)"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="opimport"></a>7. Converting sample database files (<span class="command"><strong>opimport</strong></span>)</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | This utility converts sample database files from a foreign binary format (abi) to | 
|  | the native format. This is useful only when moving sample files between hosts, | 
|  | for analysis on platforms other than the one used for collection. The abi format | 
|  | of the file to be imported is described in a text file located in <code class="filename">$SESSION_DIR/abi</code>. | 
|  | </p> | 
|  | <p> | 
|  | The following command would convert the input samples files to the | 
|  | output samples files using the given abi file as a binary description | 
|  | of the input file and the curent platform abi as a binary description | 
|  | of the output file. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | # opimport -a /var/lib/oprofile/abi -o /tmp/current/.../GLOBAL_POWER_EVENTS.200000.1.all.all.all /var/lib/.../mprime/GLOBAL_POWER_EVENTS.200000.1.all.all.all | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <div class="sect2" title="7.1. Usage of opimport"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="opimport-details"></a>7.1. Usage of <span class="command"><strong>opimport</strong></span></h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="variablelist"> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--help / -? / --usage</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show help message. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--abi / -a [filename]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Input abi file description location. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--force / -f</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Force conversion even if the input and output abi are identical. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--output / -o [filename]</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Specify the output filename. If the output file already exists, the file is | 
|  | not overwritten but data are accumulated in. Sample filename are informative | 
|  | for post profile tools and must be kept identical, in other word the pathname | 
|  | from the first path component containing a '{' must be kept as it in the | 
|  | output filename. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--verbose / -V</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Give verbose debugging output. | 
|  | </p> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="term"> | 
|  | <code class="option">--version / -v</code> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <p> | 
|  | Show version. | 
|  | </p> | 
|  | </dd> | 
|  | </dl> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="chapter" title="Chapter 5. Interpreting profiling results"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title"><a id="interpreting"></a>Chapter 5. Interpreting profiling results</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="toc"> | 
|  | <p> | 
|  | <b>Table of Contents</b> | 
|  | </p> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#irq-latency">1. Profiling interrupt latency</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#kernel-profiling">2. Kernel profiling</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#irq-masking">2.1. Interrupt masking</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#idle">2.2. Idle time</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#kernel-modules">2.3. Profiling kernel modules</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#interpreting-callgraph">3. Interpreting call-graph profiles</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#debug-info">4. Inaccuracies in annotated source</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dd> | 
|  | <dl> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#effect-of-optimizations">4.1. Side effects of optimizations</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#prologues">4.2. Prologues and epilogues</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#inlined-function">4.3. Inlined functions</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect2"> | 
|  | <a href="#wrong-linenr-info">4.4. Inaccuracy in line number information</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </dd> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#symbol-without-debug-info">5. Assembly functions</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#overlapping-symbols">6. Overlapping symbols in JITed code</a> | 
|  | </span> | 
|  | </dt> | 
|  | <dt> | 
|  | <span class="sect1"> | 
|  | <a href="#hidden-cost">7. Other discrepancies</a> | 
|  | </span> | 
|  | </dt> | 
|  | </dl> | 
|  | </div> | 
|  | <p> | 
|  | The standard caveats of profiling apply in interpreting the results from OProfile: | 
|  | profile realistic situations, profile different scenarios, profile | 
|  | for as long as a time as possible, avoid system-specific artifacts, don't trust | 
|  | the profile data too much. Also bear in mind the comments on the performance | 
|  | counters above - you <span class="emphasis"><em>cannot</em></span> rely on totally accurate | 
|  | instruction-level profiling.  However, for almost all circumstances the data | 
|  | can be useful. Ideally a utility such as Intel's VTUNE would be available to | 
|  | allow careful instruction-level analysis; go hassle Intel for this, not me ;) | 
|  | </p> | 
|  | <div class="sect1" title="1. Profiling interrupt latency"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="irq-latency"></a>1. Profiling interrupt latency</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | This is an example of how the latency of delivery of profiling interrupts | 
|  | can impact the reliability of the profiling data. This is pretty much a | 
|  | worst-case-scenario example: these problems are fairly rare. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | double fun(double a, double b, double c) | 
|  | { | 
|  | double result = 0; | 
|  | for (int i = 0 ; i < 10000; ++i) { | 
|  | result += a; | 
|  | result *= b; | 
|  | result /= c; | 
|  | } | 
|  | return result; | 
|  | } | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Here the last instruction of the loop is very costly, and you would expect the result | 
|  | reflecting that - but (cutting the instructions inside the loop): | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ opannotate -a -t 10 ./a.out | 
|  |  | 
|  | 88 15.38% : 8048337:       fadd   %st(3),%st | 
|  | 48 8.391% : 8048339:       fmul   %st(2),%st | 
|  | 68 11.88% : 804833b:       fdiv   %st(1),%st | 
|  | 368 64.33% : 804833d:       inc    %eax | 
|  | : 804833e:       cmp    $0x270f,%eax | 
|  | : 8048343:       jle    8048337 | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | The problem comes from the x86 hardware; when the counter overflows the IRQ | 
|  | is asserted but the hardware has features that can delay the NMI interrupt: | 
|  | x86 hardware is synchronous (i.e. cannot interrupt during an instruction); | 
|  | there is also a latency when the IRQ is asserted, and the multiple | 
|  | execution units and the out-of-order model of modern x86 CPUs also causes | 
|  | problems. This is the same function, with annotation : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | $ opannotate -s -t 10 ./a.out | 
|  |  | 
|  | :double fun(double a, double b, double c) | 
|  | :{ /* _Z3funddd total:     572 100.0% */ | 
|  | : double result = 0; | 
|  | 368 64.33% : for (int i = 0 ; i < 10000; ++i) { | 
|  | 88 15.38% :  result += a; | 
|  | 48 8.391% :  result *= b; | 
|  | 68 11.88% :  result /= c; | 
|  | : } | 
|  | : return result; | 
|  | :} | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | The conclusion: don't trust samples coming at the end of a loop, | 
|  | particularly if the last instruction generated by the compiler is costly. This | 
|  | case can also occur for branches. Always bear in mind that samples | 
|  | can be delayed by a few cycles from its real position. That's a hardware | 
|  | problem and OProfile can do nothing about it. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect1" title="2. Kernel profiling"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="kernel-profiling"></a>2. Kernel profiling</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect2" title="2.1. Interrupt masking"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="irq-masking"></a>2.1. Interrupt masking</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | OProfile uses non-maskable interrupts (NMI) on the P6 generation, Pentium 4, | 
|  | Athlon, Opteron, Phenom, and Turion processors. These interrupts can occur even in section of the | 
|  | Linux where interrupts are disabled, allowing collection of samples in virtually | 
|  | all executable code.  The RTC, timer interrupt mode, and Itanium 2 collection mechanisms | 
|  | use maskable interrupts. Thus, the RTC and Itanium 2 data collection mechanism have "sample | 
|  | shadows", or blind spots: regions where no samples will be collected. Typically, the samples | 
|  | will be attributed to the code immediately after the interrupts are re-enabled. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="2.2. Idle time"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="idle"></a>2.2. Idle time</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Your kernel is likely to support halting the processor when a CPU is idle. As | 
|  | the typical hardware events like <code class="constant">CPU_CLK_UNHALTED</code> do not | 
|  | count when the CPU is halted, the kernel profile will not reflect the actual | 
|  | amount of time spent idle. You can change this behaviour by booting with | 
|  | the <code class="option">idle=poll</code> option, which uses a different idle routine. This | 
|  | will appear as <code class="function">poll_idle()</code> in your kernel profile. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="2.3. Profiling kernel modules"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="kernel-modules"></a>2.3. Profiling kernel modules</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | OProfile profiles kernel modules by default. However, there are a couple of problems | 
|  | you may have when trying to get results. First, you may have booted via an initrd; | 
|  | this means that the actual path for the module binaries cannot be determined automatically. | 
|  | To get around this, you can use the <code class="option">-p</code> option to the profiling tools | 
|  | to specify where to look for the kernel modules. | 
|  | </p> | 
|  | <p> | 
|  | In 2.6, the information on where kernel module binaries are located has been removed. | 
|  | This means OProfile needs guiding with the <code class="option">-p</code> option to find your | 
|  | modules. Normally, you can just use your standard module top-level directory for this. | 
|  | Note that due to this problem, OProfile cannot check that the modification times match; | 
|  | it is your responsibility to make sure you do not modify a binary after a profile | 
|  | has been created. | 
|  | </p> | 
|  | <p> | 
|  | If you have run <span class="command"><strong>insmod</strong></span> or <span class="command"><strong>modprobe</strong></span> to insert a module | 
|  | in a particular directory, it is important that you specify this directory with the | 
|  | <code class="option">-p</code> option first, so that it over-rides an older module binary that might | 
|  | exist in other directories you've specified with <code class="option">-p</code>. It is up to you | 
|  | to make sure that these values are correct: 2.6 kernels simply do not provide enough | 
|  | information for OProfile to get this information. | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="3. Interpreting call-graph profiles"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="interpreting-callgraph"></a>3. Interpreting call-graph profiles</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Sometimes the results from call-graph profiles may be different to what | 
|  | you expect to see. The first thing to check is whether the target | 
|  | binaries where compiled with frame pointers enabled (if the binary was | 
|  | compiled using <span class="command"><strong>gcc</strong></span>'s | 
|  | <code class="option">-fomit-frame-pointer</code> option, you will not get | 
|  | meaningful results). Note that as of this writing, the GCC developers | 
|  | plan to disable frame pointers by default. The Linux kernel is built | 
|  | without frame pointers by default; there is a configuration option you | 
|  | can use to turn it on under the "Kernel Hacking" menu. | 
|  | </p> | 
|  | <p> | 
|  | Often you may see a caller of a function that does not actually directly | 
|  | call the function you're looking at (e.g. if <code class="function">a()</code> | 
|  | calls <code class="function">b()</code>, which in turn calls | 
|  | <code class="function">c()</code>, you may see an entry for | 
|  | <code class="function">a()->c()</code>).  What's actually occurring is that we | 
|  | are taking samples at the very start (or the very end) of | 
|  | <code class="function">c()</code>; at these few instructions, we haven't yet | 
|  | created the new function's frame, so it appears as if | 
|  | <code class="function">a()</code> is calling directly into | 
|  | <code class="function">c()</code>. Be careful not to be misled by these | 
|  | entries. | 
|  | </p> | 
|  | <p> | 
|  | Like the rest of OProfile, call-graph profiling uses a statistical | 
|  | approach; this means that sometimes a backtrace sample is truncated, or | 
|  | even partially wrong. Bear this in mind when examining results. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect1" title="4. Inaccuracies in annotated source"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="debug-info"></a>4. Inaccuracies in annotated source</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect2" title="4.1. Side effects of optimizations"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="effect-of-optimizations"></a>4.1. Side effects of optimizations</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The compiler can introduce some pitfalls in the annotated source output. | 
|  | The optimizer can move pieces of code in such manner that two line of codes | 
|  | are interlaced (instruction scheduling). Also debug info generated by the compiler | 
|  | can show strange behavior. This is especially true for complex expressions e.g. inside | 
|  | an if statement: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | if (a && .. | 
|  | b && .. | 
|  | c &&) | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | here the problem come from the position of line number. The available debug | 
|  | info does not give enough details for the if condition, so all samples are | 
|  | accumulated at the position of the right brace of the expression. Using | 
|  | <span class="command"><strong>opannotate <code class="option">-a</code></strong></span> can help to show the real | 
|  | samples at an assembly level. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="4.2. Prologues and epilogues"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="prologues"></a>4.2. Prologues and epilogues</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | The compiler generally needs to generate "glue" code across function calls, dependent | 
|  | on the particular function call conventions used. Additionally other things | 
|  | need to happen, like stack pointer adjustment for the local variables; this | 
|  | code is known as the function prologue. Similar code is needed at function return, | 
|  | and is known as the function epilogue. This will show up in annotations as | 
|  | samples at the very start and end of a function, where there is no apparent | 
|  | executable code in the source. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="4.3. Inlined functions"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="inlined-function"></a>4.3. Inlined functions</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | You may see that a function is credited with a certain number of samples, but | 
|  | the listing does not add up to the correct total. To pick a real example : | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | :internal_sk_buff_alloc_security(struct sk_buff *skb) | 
|  | 353 2.342%    :{ /* internal_sk_buff_alloc_security total: 1882 12.48% */ | 
|  | : | 
|  | :        sk_buff_security_t *sksec; | 
|  | 15 0.0995%   :        int rc = 0; | 
|  | : | 
|  | 10 0.06633%  :        sksec = skb->lsm_security; | 
|  | 468 3.104%    :        if (sksec && sksec->magic == DSI_MAGIC) { | 
|  | :                goto out; | 
|  | :        } | 
|  | : | 
|  | :        sksec = (sk_buff_security_t *) get_sk_buff_memory(skb); | 
|  | 3 0.0199%   :        if (!sksec) { | 
|  | 38 0.2521%   :                rc = -ENOMEM; | 
|  | :                goto out; | 
|  | 10 0.06633%  :        } | 
|  | :        memset(sksec, 0, sizeof (sk_buff_security_t)); | 
|  | 44 0.2919%   :        sksec->magic = DSI_MAGIC; | 
|  | 32 0.2123%   :        sksec->skb = skb; | 
|  | 45 0.2985%   :        sksec->sid = DSI_SID_NORMAL; | 
|  | 31 0.2056%   :        skb->lsm_security = sksec; | 
|  | : | 
|  | :      out: | 
|  | : | 
|  | 146 0.9685%   :        return rc; | 
|  | : | 
|  | 98 0.6501%   :} | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Here, the function is credited with 1,882 samples, but the annotations | 
|  | below do not account for this. This is usually because of inline functions - | 
|  | the compiler marks such code with debug entries for the inline function | 
|  | definition, and this is where <span class="command"><strong>opannotate</strong></span> annotates | 
|  | such samples. In the case above, <code class="function">memset</code> is the most | 
|  | likely candidate for this problem. Examining the mixed source/assembly | 
|  | output can help identify such results. | 
|  | </p> | 
|  | <p> | 
|  | This problem is more visible when there is no source file available, in the | 
|  | following example it's trivially visible the sums of symbols samples is less | 
|  | than the number of the samples for this file. The difference must be accounted | 
|  | to inline functions. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | /* | 
|  | * Total samples for file : "arch/i386/kernel/process.c" | 
|  | * | 
|  | *    109  2.4616 | 
|  | */ | 
|  |  | 
|  | /* default_idle total:     84  1.8970 */ | 
|  | /* cpu_idle total:         21  0.4743 */ | 
|  | /* flush_thread total:      1  0.0226 */ | 
|  | /* prepare_to_copy total:   1  0.0226 */ | 
|  | /* __switch_to total:      18  0.4065 */ | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | The missing samples are not lost, they will be credited to another source | 
|  | location where the inlined function is defined. The inlined function will be | 
|  | credited from multiple call site and merged in one place in the annotated | 
|  | source file so there is no way to see from what call site are coming the | 
|  | samples for an inlined function. | 
|  | </p> | 
|  | <p> | 
|  | When running <span class="command"><strong>opannotate</strong></span>, you may get a warning | 
|  | "some functions compiled without debug information may have incorrect source line attributions". | 
|  | In some rare cases, OProfile is not able to verify that the derived source line | 
|  | is correct (when some parts of the binary image are compiled without debugging | 
|  | information). Be wary of results if this warning appears. | 
|  | </p> | 
|  | <p> | 
|  | Furthermore, for some languages the compiler can implicitly generate functions, | 
|  | such as default copy constructors. Such functions are labelled by the compiler | 
|  | as having a line number of 0, which means the source annotation can be confusing. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect2" title="4.4. Inaccuracy in line number information"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h3 class="title"><a id="wrong-linenr-info"></a>4.4. Inaccuracy in line number information</h3> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Depending on your compiler you can fall into the following problem: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | struct big_object { int a[500]; }; | 
|  |  | 
|  | int main() | 
|  | { | 
|  | big_object a, b; | 
|  | for (int i = 0 ; i != 1000 * 1000; ++i) | 
|  | b = a; | 
|  | return 0; | 
|  | } | 
|  |  | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Compiled with <span class="command"><strong>gcc</strong></span> 3.0.4 the annotated source is clearly inaccurate: | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | :int main() | 
|  | :{  /* main total: 7871 100% */ | 
|  | :        big_object a, b; | 
|  | :        for (int i = 0 ; i != 1000 * 1000; ++i) | 
|  | :                b = a; | 
|  | 7871 100%     :        return 0; | 
|  | :} | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | The problem here is distinct from the IRQ latency problem; the debug line number | 
|  | information is not precise enough; again, looking at output of <span class="command"><strong>opannoatate -as</strong></span> can help. | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | :int main() | 
|  | :{ | 
|  | :        big_object a, b; | 
|  | :        for (int i = 0 ; i != 1000 * 1000; ++i) | 
|  | : 80484c0:       push   %ebp | 
|  | : 80484c1:       mov    %esp,%ebp | 
|  | : 80484c3:       sub    $0xfac,%esp | 
|  | : 80484c9:       push   %edi | 
|  | : 80484ca:       push   %esi | 
|  | : 80484cb:       push   %ebx | 
|  | :                b = a; | 
|  | : 80484cc:       lea    0xfffff060(%ebp),%edx | 
|  | : 80484d2:       lea    0xfffff830(%ebp),%eax | 
|  | : 80484d8:       mov    $0xf423f,%ebx | 
|  | : 80484dd:       lea    0x0(%esi),%esi | 
|  | :        return 0; | 
|  | 3 0.03811% : 80484e0:       mov    %edx,%edi | 
|  | : 80484e2:       mov    %eax,%esi | 
|  | 1 0.0127%  : 80484e4:       cld | 
|  | 8 0.1016%  : 80484e5:       mov    $0x1f4,%ecx | 
|  | 7850 99.73%   : 80484ea:       repz movsl %ds:(%esi),%es:(%edi) | 
|  | 9 0.1143%  : 80484ec:       dec    %ebx | 
|  | : 80484ed:       jns    80484e0 | 
|  | : 80484ef:       xor    %eax,%eax | 
|  | : 80484f1:       pop    %ebx | 
|  | : 80484f2:       pop    %esi | 
|  | : 80484f3:       pop    %edi | 
|  | : 80484f4:       leave | 
|  | : 80484f5:       ret | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | So here it's clear that copying is correctly credited with of all the samples, but the | 
|  | line number information is misplaced. <span class="command"><strong>objdump -dS</strong></span> exposes the | 
|  | same problem. Note that maintaining accurate debug information for compilers when optimizing is difficult, so this problem is not suprising. | 
|  | The problem of debug information | 
|  | accuracy is also dependent on the binutils version used; some BFD library versions | 
|  | contain a work-around for known problems of <span class="command"><strong>gcc</strong></span>, some others do not. This is unfortunate but we must live with that, | 
|  | since profiling is pointless when you disable optimisation (which would give better debugging entries). | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | <div class="sect1" title="5. Assembly functions"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="symbol-without-debug-info"></a>5. Assembly functions</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Often the assembler cannot generate debug information automatically. | 
|  | This means that you cannot get a source report unless | 
|  | you manually define the neccessary debug information; read your assembler documentation for how you might | 
|  | do that. The only | 
|  | debugging info needed currently by OProfile is the line-number/filename-VMA association. When profiling assembly | 
|  | without debugging info you can always get report for symbols, and optionally for VMA, through <span class="command"><strong>opreport -l</strong></span> | 
|  | or <span class="command"><strong>opreport -d</strong></span>, but this works only for symbols with the right attributes. | 
|  | For <span class="command"><strong>gas</strong></span> you can get this by | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | .globl foo | 
|  | .type	foo,@function | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | whilst for <span class="command"><strong>nasm</strong></span> you must use | 
|  | </p> | 
|  | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> | 
|  | <tr> | 
|  | <td> | 
|  | <pre class="screen"> | 
|  | GLOBAL foo:function		; [1] | 
|  | </pre> | 
|  | </td> | 
|  | </tr> | 
|  | </table> | 
|  | <p> | 
|  | Note that OProfile does not need the global attribute, only the function attribute. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect1" title="6. Overlapping symbols in JITed code"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="overlapping-symbols"></a>6. Overlapping symbols in JITed code</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Some virtual machines (e.g., Java) may re-JIT a method, resulting in previously | 
|  | allocated space for a piece of compiled code to be reused. This means that, at one distinct | 
|  | code address, multiple symbols/methods may be present during the run time of the application. | 
|  | </p> | 
|  | <p> | 
|  | Since OProfile samples are buffered and don′t have timing information, there is no way | 
|  | to correlate samples with the (possibly) varying address ranges in which the code for a symbol | 
|  | may reside. | 
|  | An alternative would be flushing the OProfile sampling buffer when we get an unload event, | 
|  | but this could result in high overhead. | 
|  | </p> | 
|  | <p> | 
|  | To moderate the problem of overlapping symbols, OProfile tries to select the symbol that was | 
|  | present at this address range most of the time. Additionally, other overlapping symbols | 
|  | are truncated in the overlapping area. | 
|  | This gives reasonable results, because in reality, address reuse typically takes place | 
|  | during phase changes of the application -- in particular, during application  startup. | 
|  | Thus, for optimum profiling results, start the sampling session after application startup | 
|  | and burn in. | 
|  | </p> | 
|  | </div> | 
|  | <div class="sect1" title="7. Other discrepancies"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title" style="clear: both"><a id="hidden-cost"></a>7. Other discrepancies</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Another cause of apparent problems is the hidden cost of instructions. A very | 
|  | common example is two memory reads: one from L1 cache and the other from memory: | 
|  | the second memory read is likely to have more samples. | 
|  | There are many other causes of hidden cost of instructions. A non-exhaustive | 
|  | list: mis-predicted branch, TLB cache miss, partial register stall, | 
|  | partial register dependencies, memory mismatch stall, re-executed µops. If you want to write | 
|  | programs at the assembly level, be sure to take a look at the Intel and | 
|  | AMD documentation at <a class="ulink" href="http://developer.intel.com/">http://developer.intel.com/</a> | 
|  | and <a class="ulink" href="http://developer.amd.com/devguides.jsp/">http://developer.amd.com/devguides.jsp</a>. | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | <div class="chapter" title="Chapter 6. Acknowledgments"> | 
|  | <div class="titlepage"> | 
|  | <div> | 
|  | <div> | 
|  | <h2 class="title"><a id="ack"></a>Chapter 6. Acknowledgments</h2> | 
|  | </div> | 
|  | </div> | 
|  | </div> | 
|  | <p> | 
|  | Thanks to (in no particular order) : Arjan van de Ven, Rik van Riel, Juan Quintela, Philippe Elie, | 
|  | Phillipp Rumpf, Tigran Aivazian, Alex Brown, Alisdair Rawsthorne, Bob Montgomery, Ray Bryant, H.J. Lu, | 
|  | Jeff Esper, Will Cohen, Graydon Hoare, Cliff Woolley, Alex Tsariounov, Al Stone, Jason Yeh, | 
|  | Randolph Chung, Anton Blanchard, Richard Henderson, Andries Brouwer, Bryan Rittmeyer, | 
|  | Maynard P. Johnson, | 
|  | Richard Reich (rreich@rdrtech.com), Zwane Mwaikambo, Dave Jones, Charles Filtness; and finally Pulp, for "Intro". | 
|  | </p> | 
|  | </div> | 
|  | </div> | 
|  | </body> | 
|  | </html> |