blob: f35f32daf299dedd763d9465aca7ae2c43ac05ab [file] [log] [blame]
#
# Unit masks for the Intel "sandy-bridge" micro architecture
#
# See http://ark.intel.com/ for help in identifying sandy-bridge based CPUs
#
include:i386/arch_perfmon
name:x02 type:mandatory default:0x2
0x2 No unit mask
name:x10 type:mandatory default:0x10
0x10 No unit mask
name:x20 type:mandatory default:0x20
0x20 No unit mask
name:ld_blocks type:bitmask default:0x1
0x1 extra: data_unknown blocked loads due to store buffer blocks with unknown data.
0x2 extra: store_forward loads blocked by overlapping with store buffer that cannot be forwarded
0x8 extra: no_sr This event counts the number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.
0x10 extra: all_block Number of cases where any load is blocked but has no DCU miss.
name:misalign_mem_ref type:bitmask default:0x1
0x1 extra: loads Speculative cache-line split load uops dispatched to the L1D.
0x2 extra: stores Speculative cache-line split Store-address uops dispatched to L1D
name:ld_blocks_partial type:bitmask default:0x1
0x1 extra: address_alias False dependencies in MOB due to partial compare on address
0x8 extra: all_sta_block This event counts the number of times that load operations are temporarily blocked because of older stores, with addresses that are not yet known. A load operation may incur more than one block of this type.
name:dtlb_load_misses type:bitmask default:0x1
0x1 extra: miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M/1G)
0x2 extra: walk_completed Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M/1G)
0x4 extra: walk_duration Cycles PMH is busy with this walk
0x10 extra: stlb_hit First level miss but second level hit; no page walk.
name:int_misc type:bitmask default:0x40
0x40 extra: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread.
0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear.
0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences.
name:uops_issued type:bitmask default:any
0x1 extra: any Number of Uops issued by the Resource Allocation Table (RAT) to the Reservation Station (RS)
0x1 extra:cmask=1,inv stall_cycles cycles no uops issued by this thread.
name:arith type:bitmask default:fpu_div_active
0x1 extra: fpu_div_active Cycles that the divider is busy with any divide or sqrt operation.
0x1 extra:cmask=1,edge fpu_div Number of times that the divider is actived, includes INT, SIMD and FP.
name:l2_rqsts type:bitmask default:0x1
0x1 extra: demand_data_rd_hit Demand Data Read hit L2, no rejects
0x4 extra: rfo_hit RFO requests that hit L2 cache
0x8 extra: rfo_miss RFO requests that miss L2 cache
0x10 extra: code_rd_hit L2 cache hits when fetching instructions, code reads.
0x20 extra: code_rd_miss L2 cache misses when fetching instructions
0x40 extra: pf_hit Requests from the L2 hardware prefetchers that hit L2 cache
0x80 extra: pf_miss Requests from the L2 hardware prefetchers that miss L2 cache
0x3 extra: all_demand_data_rd Any data read request to L2 cache
0xc extra: all_rfo Any data RFO request to L2 cache
0x30 extra: all_code_rd Any code read request to L2 cache
0xc0 extra: all_pf Any L2 HW prefetch request to L2 cache
name:l2_store_lock_rqsts type:bitmask default:0xf
0xf extra: all RFOs that access cache lines in any state
0x1 extra: miss RFO (as a result of regular RFO or Lock request) miss cache - I state
0x4 extra: hit_e RFO (as a result of regular RFO or Lock request) hits cache in E state
0x8 extra: hit_m RFO (as a result of regular RFO or Lock request) hits cache in M state
name:l2_l1d_wb_rqsts type:bitmask default:0x4
0x4 extra: hit_e writebacks from L1D to L2 cache lines in E state
0x8 extra: hit_m writebacks from L1D to L2 cache lines in M state
name:l1d_pend_miss type:bitmask default:pending
0x1 extra: pending Cycles with L1D load Misses outstanding.
0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences.
name:dtlb_store_misses type:bitmask default:0x1
0x1 extra: miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M/1G)
0x2 extra: walk_completed Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M/1G)
0x4 extra: walk_duration Cycles PMH is busy with this walk
0x10 extra: stlb_hit First level miss but second level hit; no page walk. Only relevant if multiple levels.
name:load_hit_pre type:bitmask default:0x1
0x1 extra: sw_pf Load dispatches that hit fill buffer allocated for S/W prefetch.
0x2 extra: hw_pf Load dispatches that hit fill buffer allocated for HW prefetch.
name:l1d type:bitmask default:0x1
0x1 extra: replacement L1D Data line replacements.
0x2 extra: allocated_in_m L1D M-state Data Cache Lines Allocated
0x4 extra: eviction L1D M-state Data Cache Lines Evicted due to replacement (only)
0x8 extra: all_m_replacement All Modified lines evicted out of L1D
name:partial_rat_stalls type:bitmask default:flags_merge_uop
0x20 extra: flags_merge_uop Number of perf sensitive flags-merge uops added by Sandy Bridge u-arch.
0x40 extra: slow_lea_window Number of cycles with at least 1 slow Load Effective Address (LEA) uop being allocated.
0x80 extra: mul_single_uop Number of Multiply packed/scalar single precision uops allocated
0x20 extra:cmask=1 flags_merge_uop_cycles Cycles with perf sensitive flags-merge uops added by SandyBridge u-arch.
name:resource_stalls2 type:bitmask default:0x40
0x40 extra: bob_full Cycles Allocator is stalled due Branch Order Buffer (BOB).
0xf extra: all_prf_control Resource stalls2 control structures full for physical registers
0xc extra: all_fl_empty Cycles with either free list is empty
0x4f extra: ooo_rsrc Resource stalls2 control structures full Physical Register Reclaim Table (PRRT), Physical History Table (PHT), INT or SIMD Free List (FL), Branch Order Buffer (BOB)
name:cpl_cycles type:bitmask default:ring0
0x1 extra: ring0 Unhalted core cycles the Thread was in Rings 0.
0x1 extra:cmask=1,edge ring0_trans Transitions from ring123 to Ring0.
0x2 extra: ring123 Unhalted core cycles the Thread was in Rings 1/2/3.
name:offcore_requests_outstanding type:bitmask default:cycles_with_demand_data_rd
0x1 extra: demand_data_rd Offcore outstanding Demand Data Read transactions in the SuperQueue (SQ), queue to uncore, every cycle. Includes L1D data hardware prefetches.
0x1 extra:cmask=1 cycles_with_demand_data_rd cycles there are Offcore outstanding RD data transactions in the SuperQueue (SQ), queue to uncore.
0x2 extra: demand_code_rd Offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle.
0x4 extra: demand_rfo Offcore outstanding RFO (store) transactions in the SuperQueue (SQ), queue to uncore, every cycle.
0x8 extra: all_data_rd Offcore outstanding all cacheable Core Data Read transactions in the SuperQueue (SQ), queue to uncore, every cycle.
0x8 extra:cmask=1 cycles_with_data_rd Cycles there are Offcore outstanding all Data read transactions in the SuperQueue (SQ), queue to uncore, every cycle.
0x2 extra:cmask=1 cycles_with_demand_code_rd Cycles with offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle.
0x4 extra:cmask=1 cycles_with_demand_rfo Cycles with offcore outstanding demand RFO Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle.
name:lock_cycles type:bitmask default:0x1
0x1 extra: split_lock_uc_lock_duration Cycles in which the L1D and L2 are locked, due to a UC lock or split lock
0x2 extra: cache_lock_duration cycles that theL1D is locked
name:idq type:bitmask default:0x2
0x2 extra: empty Cycles the Instruction Decode Queue (IDQ) is empty.
0x4 extra: mite_uops Number of uops delivered to Instruction Decode Queue (IDQ) from MITE path.
0x8 extra: dsb_uops Number of uops delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path.
0x10 extra: ms_dsb_uops Number of Uops delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by Decode Stream Buffer (DSB).
0x20 extra: ms_mite_uops Number of Uops delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by MITE.
0x30 extra: ms_uops Number of Uops were delivered into Instruction Decode Queue (IDQ) from MS, initiated by Decode Stream Buffer (DSB) or MITE.
0x30 extra:cmask=1 ms_cycles Number of cycles that Uops were delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by Decode Stream Buffer (DSB) or MITE.
0x4 extra:cmask=1 mite_cycles Cycles MITE is active
0x8 extra:cmask=1 dsb_cycles Cycles Decode Stream Buffer (DSB) is active
0x10 extra:cmask=1 ms_dsb_cycles Cycles Decode Stream Buffer (DSB) Microcode Sequenser (MS) is active
0x10 extra:cmask=1,edge ms_dsb_occur Occurences of Decode Stream Buffer (DSB) Microcode Sequenser (MS) going active
0x18 extra:cmask=1 all_dsb_cycles_any_uops Cycles Decode Stream Buffer (DSB) is delivering anything
0x18 extra:cmask=4 all_dsb_cycles_4_uops Cycles Decode Stream Buffer (DSB) is delivering 4 Uops
0x24 extra:cmask=1 all_mite_cycles_any_uops Cycles MITE is delivering anything
0x24 extra:cmask=4 all_mite_cycles_4_uops Cycles MITE is delivering 4 Uops
0x3c extra: mite_all_uops Number of uops delivered to Instruction Decode Queue (IDQ) from any path.
name:itlb_misses type:bitmask default:0x1
0x1 extra: miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M)
0x2 extra: walk_completed Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M)
0x4 extra: walk_duration Cycles PMH is busy with this walk.
0x10 extra: stlb_hit First level miss but second level hit; no page walk.
name:ild_stall type:bitmask default:0x1
0x1 extra: lcp Stall "occurrences" due to length changing prefixes (LCP).
0x4 extra: iq_full Stall cycles when instructions cannot be written because the Instruction Queue (IQ) is full.
name:br_inst_exec type:bitmask default:0xff
0xff extra: all_branches All branch instructions executed.
0x41 extra: nontaken_conditional All macro conditional nontaken branch instructions.
0x81 extra: taken_conditional All macro conditional taken branch instructions.
0x82 extra: taken_direct_jump All macro unconditional taken branch instructions, excluding calls and indirects.
0x84 extra: taken_indirect_jump_non_call_ret All taken indirect branches that are not calls nor returns.
0x88 extra: taken_indirect_near_return All taken indirect branches that have a return mnemonic.
0x90 extra: taken_direct_near_call All taken non-indirect calls.
0xa0 extra: taken_indirect_near_call All taken indirect calls, including both register and memory indirect.
0xc1 extra: all_conditional All macro conditional branch instructions.
0xc2 extra: all_direct_jmp All macro unconditional branch instructions, excluding calls and indirects
0xc4 extra: all_indirect_jump_non_call_ret All indirect branches that are not calls nor returns.
0xc8 extra: all_indirect_near_return All indirect return branches.
0xd0 extra: all_direct_near_call All non-indirect calls executed.
name:br_misp_exec type:bitmask default:0xff
0xff extra: all_branches All mispredicted branch instructions executed.
0x41 extra: nontaken_conditional All nontaken mispredicted macro conditional branch instructions.
0x81 extra: taken_conditional All taken mispredicted macro conditional branch instructions.
0x84 extra: taken_indirect_jump_non_call_ret All taken mispredicted indirect branches that are not calls nor returns.
0x88 extra: taken_return_near All taken mispredicted indirect branches that have a return mnemonic.
0x90 extra: taken_direct_near_call All taken mispredicted non-indirect calls.
0xa0 extra: taken_indirect_near_call All taken mispredicted indirect calls, including both register and memory indirect.
0xc1 extra: all_conditional All mispredicted macro conditional branch instructions.
0xc4 extra: all_indirect_jump_non_call_ret All mispredicted indirect branches that are not calls nor returns.
0xd0 extra: all_direct_near_call All mispredicted non-indirect calls
name:idq_uops_not_delivered type:bitmask default:core
0x1 extra: core Count number of non-delivered uops to Resource Allocation Table (RAT).
0x1 extra:cmask=4 cycles_0_uops_deliv.core Counts the cycles no uops were delivered
0x1 extra:cmask=3 cycles_le_1_uop_deliv.core Counts the cycles less than 1 uops were delivered
0x1 extra:cmask=2 cycles_le_2_uop_deliv.core Counts the cycles less than 2 uops were delivered
0x1 extra:cmask=1 cycles_le_3_uop_deliv.core Counts the cycles less than 3 uops were delivered
0x1 extra:cmask=4,inv cycles_ge_1_uop_deliv.core Cycles when 1 or more uops were delivered to the by the front end.
0x1 extra:cmask=1,inv cycles_fe_was_ok Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE.
name:uops_dispatched_port type:bitmask default:0x1
0x1 extra: port_0 Cycles which a Uop is dispatched on port 0
0x2 extra: port_1 Cycles which a Uop is dispatched on port 1
0x4 extra: port_2_ld Cycles which a load Uop is dispatched on port 2
0x8 extra: port_2_sta Cycles which a STA Uop is dispatched on port 2
0x10 extra: port_3_ld Cycles which a load Uop is dispatched on port 3
0x20 extra: port_3_sta Cycles which a STA Uop is dispatched on port 3
0x40 extra: port_4 Cycles which a Uop is dispatched on port 4
0x80 extra: port_5 Cycles which a Uop is dispatched on port 5
0xc extra: port_2 Uops disptached to port 2, loads and stores (speculative and retired)
0x30 extra: port_3 Uops disptached to port 3, loads and stores (speculative and retired)
0xc extra: port_2_core Uops disptached to port 2, loads and stores per core (speculative and retired)
0x30 extra: port_3_core Uops disptached to port 3, loads and stores per core (speculative and retired)
name:resource_stalls type:bitmask default:0x1
0x1 extra: any Cycles Allocation is stalled due to Resource Related reason.
0x2 extra: lb Cycles Allocator is stalled due to Load Buffer full
0x4 extra: rs Stall due to no eligible Reservation Station (RS) entry available.
0x8 extra: sb Cycles Allocator is stalled due to Store Buffer full (not including draining from synch).
0x10 extra: rob ROB full cycles.
0xe extra: mem_rs Resource stalls due to LB, SB or Reservation Station (RS) being completely in use
0xf0 extra: ooo_rsrc Resource stalls due to Rob being full, FCSW, MXCSR and OTHER
0xa extra: lb_sb Resource stalls due to load or store buffers
name:dsb2mite_switches type:bitmask default:0x1
0x1 extra: count Number of Decode Stream Buffer (DSB) to MITE switches
0x2 extra: penalty_cycles Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles.
name:dsb_fill type:bitmask default:0x2
0x2 extra: other_cancel Count number of times a valid DSB fill has been actually cancelled for any reason.
0x8 extra: exceed_dsb_lines Decode Stream Buffer (DSB) Fill encountered > 3 Decode Stream Buffer (DSB) lines.
0xa extra: all_cancel Count number of times a valid Decode Stream Buffer (DSB) fill has been actually cancelled for any reason.
name:offcore_requests type:bitmask default:0x1
0x1 extra: demand_data_rd Demand Data Read requests sent to uncore
0x2 extra: demand_code_rd Offcore Code read requests. Includes Cacheable and Un-cacheables.
0x4 extra: demand_rfo Offcore Demand RFOs. Includes regular RFO, Locks, ItoM.
0x8 extra: all_data_rd Offcore Demand and prefetch data reads returned to the core.
name:uops_dispatched type:bitmask default:thread
0x1 extra: thread Counts total number of uops to be dispatched per-thread each cycle.
0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatced to be executed on this thread.
0x2 extra: core Counts total number of uops dispatched from any thread
name:tlb_flush type:bitmask default:0x1
0x1 extra: dtlb_thread Count number of DTLB flushes of thread-specific entries.
0x20 extra: stlb_any Count number of any STLB flushes
name:l1d_blocks type:bitmask default:bank_conflict_cycles
0x1 extra: ld_bank_conflict Any dispatched loads cancelled due to DCU bank conflict
0x5 extra:cmask=1 bank_conflict_cycles Cycles with l1d blocks due to bank conflicts
name:other_assists type:bitmask default:0x2
0x2 extra: itlb_miss_retired Instructions that experienced an ITLB miss. Non Pebs
0x10 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable Non Pebs
0x20 extra: sse_to_avx Number of transitions from legacy SSE to AVX-256 when penalty applicable Non Pebs
name:uops_retired type:bitmask default:all
0x1 extra: all All uops that actually retired.
0x2 extra: retire_slots number of retirement slots used non PEBS
0x1 extra:cmask=1,inv stall_cycles Cycles no executable uops retired
0x1 extra:cmask=10,inv total_cycles Number of cycles using always true condition applied to non PEBS uops retired event.
name:machine_clears type:bitmask default:0x2
0x2 extra: memory_ordering Number of Memory Ordering Machine Clears detected.
0x4 extra: smc Number of Self-modifying code (SMC) Machine Clears detected.
0x20 extra: maskmov Number of AVX masked mov Machine Clears detected.
name:br_inst_retired type:bitmask default:0x1
0x1 extra: conditional Counts all taken and not taken macro conditional branch instructions.
0x2 extra: near_call Counts all macro direct and indirect near calls. non PEBS
0x8 extra: near_return This event counts the number of near ret instructions retired.
0x10 extra: not_taken Counts all not taken macro branch instructions retired.
0x20 extra: near_taken Counts the number of near branch taken instructions retired.
0x40 extra: far_branch Counts the number of far branch instructions retired.
0x4 extra: all_branches_ps Counts all taken and not taken macro branches including far branches.(Precise Event)
0x2 extra: near_call_r3 Ring123 only near calls (non precise)
0x2 extra: near_call_r3_ps Ring123 only near calls (precise event)
name:br_misp_retired type:bitmask default:0x1
0x1 extra: conditional All mispredicted macro conditional branch instructions.
0x2 extra: near_call All macro direct and indirect near calls
0x10 extra: not_taken number of branch instructions retired that were mispredicted and not-taken.
0x20 extra: taken number of branch instructions retired that were mispredicted and taken.
0x4 extra: all_branches_ps all macro branches (Precise Event)
name:fp_assist type:bitmask default:0x1e
0x1e extra:cmask=1 any Counts any FP_ASSIST umask was incrementing.
0x2 extra: x87_output output - Numeric Overflow, Numeric Underflow, Inexact Result
0x4 extra: x87_input input - Invalid Operation, Denormal Operand, SNaN Operand
0x8 extra: simd_output Any output SSE* FP Assist - Numeric Overflow, Numeric Underflow.
0x10 extra: simd_input Any input SSE* FP Assist
name:mem_uops_retired type:bitmask default:0x11
0x11 extra: stlb_miss_loads STLB misses dues to retired loads
0x12 extra: stlb_miss_stores STLB misses dues to retired stores
0x21 extra: lock_loads Locked retired loads
0x41 extra: split_loads Retired loads causing cacheline splits
0x42 extra: split_stores Retired stores causing cacheline splits
0x81 extra: all_loads Any retired loads
0x82 extra: all_stores Any retired stores
name:mem_load_uops_retired type:bitmask default:0x1
0x1 extra: l1_hit Load hit in nearest-level (L1D) cache
0x2 extra: l2_hit Load hit in mid-level (L2) cache
0x4 extra: llc_hit Load hit in last-level (L3) cache with no snoop needed
0x40 extra: hit_lfb A load missed L1D but hit the Fill Buffer
name:mem_load_uops_llc_hit_retired type:bitmask default:0x1
0x1 extra: xsnp_miss Load LLC Hit and a cross-core Snoop missed in on-pkg core cache
0x2 extra: xsnp_hit Load LLC Hit and a cross-core Snoop hits in on-pkg core cache
0x4 extra: xsnp_hitm Load had HitM Response from a core on same socket (shared LLC).
0x8 extra: xsnp_none Load hit in last-level (L3) cache with no snoop needed.
name:l2_trans type:bitmask default:0x80
0x80 extra: all_requests Transactions accessing L2 pipe
0x1 extra: demand_data_rd Demand Data Read requests that access L2 cache, includes L1D prefetches.
0x2 extra: rfo RFO requests that access L2 cache
0x4 extra: code_rd L2 cache accesses when fetching instructions including L1D code prefetches
0x8 extra: all_pf L2 or LLC HW prefetches that access L2 cache
0x10 extra: l1d_wb L1D writebacks that access L2 cache
0x20 extra: l2_fill L2 fill requests that access L2 cache
0x40 extra: l2_wb L2 writebacks that access L2 cache
name:l2_lines_in type:bitmask default:0x7
0x7 extra: all L2 cache lines filling L2
0x1 extra: i L2 cache lines in I state filling L2
0x2 extra: s L2 cache lines in S state filling L2
0x4 extra: e L2 cache lines in E state filling L2
name:l2_lines_out type:bitmask default:0x1
0x1 extra: demand_clean Clean line evicted by a demand
0x2 extra: demand_dirty Dirty line evicted by a demand
0x4 extra: pf_clean Clean line evicted by an L2 Prefetch
0x8 extra: pf_dirty Dirty line evicted by an L2 Prefetch
0xa extra: dirty_all Any Dirty line evicted