| perf-stat(1) | 
 | ============ | 
 |  | 
 | NAME | 
 | ---- | 
 | perf-stat - Run a command and gather performance counter statistics | 
 |  | 
 | SYNOPSIS | 
 | -------- | 
 | [verse] | 
 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command> | 
 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>] | 
 |  | 
 | DESCRIPTION | 
 | ----------- | 
 | This command runs a command and gathers performance counter statistics | 
 | from it. | 
 |  | 
 |  | 
 | OPTIONS | 
 | ------- | 
 | <command>...:: | 
 | 	Any command you can specify in a shell. | 
 |  | 
 |  | 
 | -e:: | 
 | --event=:: | 
 | 	Select the PMU event. Selection can be: | 
 |  | 
 | 	- a symbolic event name (use 'perf list' to list all events) | 
 |  | 
 | 	- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a | 
 | 	  hexadecimal event descriptor. | 
 |  | 
 | 	- a symbolically formed event like 'pmu/param1=0x3,param2/' where | 
 | 	  param1 and param2 are defined as formats for the PMU in | 
 | 	  /sys/bus/event_sources/devices/<pmu>/format/* | 
 |  | 
 | 	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/' | 
 | 	  where M, N, K are numbers (in decimal, hex, octal format). | 
 | 	  Acceptable values for each of 'config', 'config1' and 'config2' | 
 | 	  parameters are defined by corresponding entries in | 
 | 	  /sys/bus/event_sources/devices/<pmu>/format/* | 
 |  | 
 | -i:: | 
 | --no-inherit:: | 
 |         child tasks do not inherit counters | 
 | -p:: | 
 | --pid=<pid>:: | 
 |         stat events on existing process id (comma separated list) | 
 |  | 
 | -t:: | 
 | --tid=<tid>:: | 
 |         stat events on existing thread id (comma separated list) | 
 |  | 
 |  | 
 | -a:: | 
 | --all-cpus:: | 
 |         system-wide collection from all CPUs | 
 |  | 
 | -c:: | 
 | --scale:: | 
 | 	scale/normalize counter values | 
 |  | 
 | -r:: | 
 | --repeat=<n>:: | 
 | 	repeat command and print average + stddev (max: 100). 0 means forever. | 
 |  | 
 | -B:: | 
 | --big-num:: | 
 |         print large numbers with thousands' separators according to locale | 
 |  | 
 | -C:: | 
 | --cpu=:: | 
 | Count only on the list of CPUs provided. Multiple CPUs can be provided as a | 
 | comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. | 
 | In per-thread mode, this option is ignored. The -a option is still necessary | 
 | to activate system-wide monitoring. Default is to count on all CPUs. | 
 |  | 
 | -A:: | 
 | --no-aggr:: | 
 | Do not aggregate counts across all monitored CPUs in system-wide mode (-a). | 
 | This option is only valid in system-wide mode. | 
 |  | 
 | -n:: | 
 | --null:: | 
 |         null run - don't start any counters | 
 |  | 
 | -v:: | 
 | --verbose:: | 
 |         be more verbose (show counter open errors, etc) | 
 |  | 
 | -x SEP:: | 
 | --field-separator SEP:: | 
 | print counts using a CSV-style output to make it easy to import directly into | 
 | spreadsheets. Columns are separated by the string specified in SEP. | 
 |  | 
 | -G name:: | 
 | --cgroup name:: | 
 | monitor only in the container (cgroup) called "name". This option is available only | 
 | in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to | 
 | container "name" are monitored when they run on the monitored CPUs. Multiple cgroups | 
 | can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup | 
 | to first event, second cgroup to second event and so on. It is possible to provide | 
 | an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have | 
 | corresponding events, i.e., they always refer to events defined earlier on the command | 
 | line. | 
 |  | 
 | -o file:: | 
 | --output file:: | 
 | Print the output into the designated file. | 
 |  | 
 | --append:: | 
 | Append to the output file designated with the -o option. Ignored if -o is not specified. | 
 |  | 
 | --log-fd:: | 
 |  | 
 | Log output to fd, instead of stderr.  Complementary to --output, and mutually exclusive | 
 | with it.  --append may be used here.  Examples: | 
 |      3>results  perf stat --log-fd 3          -- $cmd | 
 |      3>>results perf stat --log-fd 3 --append -- $cmd | 
 |  | 
 | --pre:: | 
 | --post:: | 
 | 	Pre and post measurement hooks, e.g.: | 
 |  | 
 | perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage | 
 |  | 
 | -I msecs:: | 
 | --interval-print msecs:: | 
 | 	Print count deltas every N milliseconds (minimum: 100ms) | 
 | 	example: perf stat -I 1000 -e cycles -a sleep 5 | 
 |  | 
 | --per-socket:: | 
 | Aggregate counts per processor socket for system-wide mode measurements.  This | 
 | is a useful mode to detect imbalance between sockets.  To enable this mode, | 
 | use --per-socket in addition to -a. (system-wide).  The output includes the | 
 | socket number and the number of online processors on that socket. This is | 
 | useful to gauge the amount of aggregation. | 
 |  | 
 | --per-core:: | 
 | Aggregate counts per physical processor for system-wide mode measurements.  This | 
 | is a useful mode to detect imbalance between physical cores.  To enable this mode, | 
 | use --per-core in addition to -a. (system-wide).  The output includes the | 
 | core number and the number of online logical processors on that physical processor. | 
 |  | 
 | -D msecs:: | 
 | --delay msecs:: | 
 | After starting the program, wait msecs before measuring. This is useful to | 
 | filter out the startup phase of the program, which is often very different. | 
 |  | 
 | -T:: | 
 | --transaction:: | 
 |  | 
 | Print statistics of transactional execution if supported. | 
 |  | 
 | EXAMPLES | 
 | -------- | 
 |  | 
 | $ perf stat -- make -j | 
 |  | 
 |  Performance counter stats for 'make -j': | 
 |  | 
 |     8117.370256  task clock ticks     #      11.281 CPU utilization factor | 
 |             678  context switches     #       0.000 M/sec | 
 |             133  CPU migrations       #       0.000 M/sec | 
 |          235724  pagefaults           #       0.029 M/sec | 
 |     24821162526  CPU cycles           #    3057.784 M/sec | 
 |     18687303457  instructions         #    2302.138 M/sec | 
 |       172158895  cache references     #      21.209 M/sec | 
 |        27075259  cache misses         #       3.335 M/sec | 
 |  | 
 |  Wall-clock time elapsed:   719.554352 msecs | 
 |  | 
 | SEE ALSO | 
 | -------- | 
 | linkperf:perf-top[1], linkperf:perf-list[1] |