blob: e02bb33c41ef13278c40967e9db27efa632ce8df [file] [log] [blame]
#
# Unit masks for the Intel "sandy-bridge" micro architecture
#
# See http://ark.intel.com/ for help in identifying sandy-bridge based CPUs
#
include:i386/arch_perfmon
name:x02 type:mandatory default:0x2
0x2 No unit mask
name:x10 type:mandatory default:0x10
0x10 No unit mask
name:x20 type:mandatory default:0x20
0x20 No unit mask
name:ld_blocks type:bitmask default:0x1
0x1 data_unknown blocked loads due to store buffer blocks with unknown data.
0x2 store_forward loads blocked by overlapping with store buffer that cannot be forwarded
0x8 no_sr This event counts the number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.
0x10 all_block Number of cases where any load is blocked but has no DCU miss.
name:misalign_mem_ref type:bitmask default:0x1
0x1 loads Speculative cache-line split load uops dispatched to the L1D.
0x2 stores Speculative cache-line split Store-address uops dispatched to L1D
name:ld_blocks_partial type:bitmask default:0x1
0x1 address_alias False dependencies in MOB due to partial compare on address
0x8 all_sta_block This event counts the number of times that load operations are temporarily blocked because of older stores, with addresses that are not yet known. A load operation may incur more than one block of this type.
name:dtlb_load_misses type:bitmask default:0x1
0x1 miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M/1G)
0x2 walk_completed Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M/1G)
0x4 walk_duration Cycles PMH is busy with this walk
0x10 stlb_hit First level miss but second level hit; no page walk.
name:int_misc type:bitmask default:0x40
0x40 rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread.
0x3 extra:cmask=1 recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear.
0x3 extra:cmask=1,edge recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences.
name:uops_issued type:bitmask default:0x1
0x1 any Number of Uops issued by the Resource Allocation Table (RAT) to the Reservation Station (RS)
0x1 extra:cmask=1,inv stall_cycles cycles no uops issued by this thread.
name:arith type:bitmask default:0x1
0x1 fpu_div_active Cycles that the divider is busy with any divide or sqrt operation.
0x1 extra:cmask=1,edge fpu_div Number of times that the divider is actived, includes INT, SIMD and FP.
name:l2_rqsts type:bitmask default:0x1
0x1 demand_data_rd_hit Demand Data Read hit L2, no rejects
0x4 rfo_hit RFO requests that hit L2 cache
0x8 rfo_miss RFO requests that miss L2 cache
0x10 code_rd_hit L2 cache hits when fetching instructions, code reads.
0x20 code_rd_miss L2 cache misses when fetching instructions
0x40 pf_hit Requests from the L2 hardware prefetchers that hit L2 cache
0x80 pf_miss Requests from the L2 hardware prefetchers that miss L2 cache
0x3 all_demand_data_rd Any data read request to L2 cache
0xc all_rfo Any data RFO request to L2 cache
0x30 all_code_rd Any code read request to L2 cache
0xc0 all_pf Any L2 HW prefetch request to L2 cache
name:l2_store_lock_rqsts type:bitmask default:0xf
0xf all RFOs that access cache lines in any state
0x1 miss RFO (as a result of regular RFO or Lock request) miss cache - I state
0x4 hit_e RFO (as a result of regular RFO or Lock request) hits cache in E state
0x8 hit_m RFO (as a result of regular RFO or Lock request) hits cache in M state
name:l2_l1d_wb_rqsts type:bitmask default:0x4
0x4 hit_e writebacks from L1D to L2 cache lines in E state
0x8 hit_m writebacks from L1D to L2 cache lines in M state
name:l1d_pend_miss type:bitmask default:0x1
0x1 pending Cycles with L1D load Misses outstanding.
0x1 extra:cmask=1,edge occurences This event counts the number of L1D misses outstanding occurences.
name:dtlb_store_misses type:bitmask default:0x1
0x1 miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M/1G)
0x2 walk_completed Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M/1G)
0x4 walk_duration Cycles PMH is busy with this walk
0x10 stlb_hit First level miss but second level hit; no page walk. Only relevant if multiple levels.
name:load_hit_pre type:bitmask default:0x1
0x1 sw_pf Load dispatches that hit fill buffer allocated for S/W prefetch.
0x2 hw_pf Load dispatches that hit fill buffer allocated for HW prefetch.
name:l1d type:bitmask default:0x1
0x1 replacement L1D Data line replacements.
0x2 allocated_in_m L1D M-state Data Cache Lines Allocated
0x4 eviction L1D M-state Data Cache Lines Evicted due to replacement (only)
0x8 all_m_replacement All Modified lines evicted out of L1D
name:partial_rat_stalls type:bitmask default:0x20
0x20 flags_merge_uop Number of perf sensitive flags-merge uops added by Sandy Bridge u-arch.
0x40 slow_lea_window Number of cycles with at least 1 slow Load Effective Address (LEA) uop being allocated.
0x80 mul_single_uop Number of Multiply packed/scalar single precision uops allocated
0x20 extra:cmask=1 flags_merge_uop_cycles Cycles with perf sensitive flags-merge uops added by SandyBridge u-arch.
name:resource_stalls2 type:bitmask default:0x40
0x40 bob_full Cycles Allocator is stalled due Branch Order Buffer (BOB).
0xf all_prf_control Resource stalls2 control structures full for physical registers
0xc all_fl_empty Cycles with either free list is empty
0x4f ooo_rsrc Resource stalls2 control structures full Physical Register Reclaim Table (PRRT), Physical History Table (PHT), INT or SIMD Free List (FL), Branch Order Buffer (BOB)
name:cpl_cycles type:bitmask default:0x1
0x1 ring0 Unhalted core cycles the Thread was in Rings 0.
0x1 extra:cmask=1,edge ring0_trans Transitions from ring123 to Ring0.
0x2 ring123 Unhalted core cycles the Thread was in Rings 1/2/3.
name:offcore_requests_outstanding type:bitmask default:0x1
0x1 demand_data_rd Offcore outstanding Demand Data Read transactions in the SuperQueue (SQ), queue to uncore, every cycle. Includes L1D data hardware prefetches.
0x1 extra:cmask=1 cycles_with_demand_data_rd cycles there are Offcore outstanding RD data transactions in the SuperQueue (SQ), queue to uncore.
0x2 demand_code_rd Offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle.
0x4 demand_rfo Offcore outstanding RFO (store) transactions in the SuperQueue (SQ), queue to uncore, every cycle.
0x8 all_data_rd Offcore outstanding all cacheable Core Data Read transactions in the SuperQueue (SQ), queue to uncore, every cycle.
0x8 extra:cmask=1 cycles_with_data_rd Cycles there are Offcore outstanding all Data read transactions in the SuperQueue (SQ), queue to uncore, every cycle.
0x2 extra:cmask=1 cycles_with_demand_code_rd Cycles with offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle.
0x4 extra:cmask=1 cycles_with_demand_rfo Cycles with offcore outstanding demand RFO Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle.
name:lock_cycles type:bitmask default:0x1
0x1 split_lock_uc_lock_duration Cycles in which the L1D and L2 are locked, due to a UC lock or split lock
0x2 cache_lock_duration cycles that theL1D is locked
name:idq type:bitmask default:0x2
0x2 empty Cycles the Instruction Decode Queue (IDQ) is empty.
0x4 mite_uops Number of uops delivered to Instruction Decode Queue (IDQ) from MITE path.
0x8 dsb_uops Number of uops delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path.
0x10 ms_dsb_uops Number of Uops delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by Decode Stream Buffer (DSB).
0x20 ms_mite_uops Number of Uops delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by MITE.
0x30 ms_uops Number of Uops were delivered into Instruction Decode Queue (IDQ) from MS, initiated by Decode Stream Buffer (DSB) or MITE.
0x30 extra:cmask=1 ms_cycles Number of cycles that Uops were delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by Decode Stream Buffer (DSB) or MITE.
0x4 extra:cmask=1 mite_cycles Cycles MITE is active
0x8 extra:cmask=1 dsb_cycles Cycles Decode Stream Buffer (DSB) is active
0x10 extra:cmask=1 ms_dsb_cycles Cycles Decode Stream Buffer (DSB) Microcode Sequenser (MS) is active
0x10 extra:cmask=1,edge ms_dsb_occur Occurences of Decode Stream Buffer (DSB) Microcode Sequenser (MS) going active
0x18 extra:cmask=1 all_dsb_cycles_any_uops Cycles Decode Stream Buffer (DSB) is delivering anything
0x18 extra:cmask=4 all_dsb_cycles_4_uops Cycles Decode Stream Buffer (DSB) is delivering 4 Uops
0x24 extra:cmask=1 all_mite_cycles_any_uops Cycles MITE is delivering anything
0x24 extra:cmask=4 all_mite_cycles_4_uops Cycles MITE is delivering 4 Uops
0x3c mite_all_uops Number of uops delivered to Instruction Decode Queue (IDQ) from any path.
name:itlb_misses type:bitmask default:0x1
0x1 miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M)
0x2 walk_completed Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M)
0x4 walk_duration Cycles PMH is busy with this walk.
0x10 stlb_hit First level miss but second level hit; no page walk.
name:ild_stall type:bitmask default:0x1
0x1 lcp Stall "occurrences" due to length changing prefixes (LCP).
0x4 iq_full Stall cycles when instructions cannot be written because the Instruction Queue (IQ) is full.
name:br_inst_exec type:bitmask default:0xff
0xff all_branches All branch instructions executed.
0x41 nontaken_conditional All macro conditional nontaken branch instructions.
0x81 taken_conditional All macro conditional taken branch instructions.
0x82 taken_direct_jump All macro unconditional taken branch instructions, excluding calls and indirects.
0x84 taken_indirect_jump_non_call_ret All taken indirect branches that are not calls nor returns.
0x88 taken_indirect_near_return All taken indirect branches that have a return mnemonic.
0x90 taken_direct_near_call All taken non-indirect calls.
0xa0 taken_indirect_near_call All taken indirect calls, including both register and memory indirect.
0xc1 all_conditional All macro conditional branch instructions.
0xc2 all_direct_jmp All macro unconditional branch instructions, excluding calls and indirects
0xc4 all_indirect_jump_non_call_ret All indirect branches that are not calls nor returns.
0xc8 all_indirect_near_return All indirect return branches.
0xd0 all_direct_near_call All non-indirect calls executed.
name:br_misp_exec type:bitmask default:0xff
0xff all_branches All mispredicted branch instructions executed.
0x41 nontaken_conditional All nontaken mispredicted macro conditional branch instructions.
0x81 taken_conditional All taken mispredicted macro conditional branch instructions.
0x84 taken_indirect_jump_non_call_ret All taken mispredicted indirect branches that are not calls nor returns.
0x88 taken_return_near All taken mispredicted indirect branches that have a return mnemonic.
0x90 taken_direct_near_call All taken mispredicted non-indirect calls.
0xa0 taken_indirect_near_call All taken mispredicted indirect calls, including both register and memory indirect.
0xc1 all_conditional All mispredicted macro conditional branch instructions.
0xc4 all_indirect_jump_non_call_ret All mispredicted indirect branches that are not calls nor returns.
0xd0 all_direct_near_call All mispredicted non-indirect calls
name:idq_uops_not_delivered type:bitmask default:0x1
0x1 core Count number of non-delivered uops to Resource Allocation Table (RAT).
0x1 extra:cmask=4 cycles_0_uops_deliv.core Counts the cycles no uops were delivered
0x1 extra:cmask=3 cycles_le_1_uop_deliv.core Counts the cycles less than 1 uops were delivered
0x1 extra:cmask=2 cycles_le_2_uop_deliv.core Counts the cycles less than 2 uops were delivered
0x1 extra:cmask=1 cycles_le_3_uop_deliv.core Counts the cycles less than 3 uops were delivered
0x1 extra:cmask=4,inv cycles_ge_1_uop_deliv.core Cycles when 1 or more uops were delivered to the by the front end.
0x1 extra:cmask=1,inv cycles_fe_was_ok Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE.
name:uops_dispatched_port type:bitmask default:0x1
0x1 port_0 Cycles which a Uop is dispatched on port 0
0x2 port_1 Cycles which a Uop is dispatched on port 1
0x4 port_2_ld Cycles which a load Uop is dispatched on port 2
0x8 port_2_sta Cycles which a STA Uop is dispatched on port 2
0x10 port_3_ld Cycles which a load Uop is dispatched on port 3
0x20 port_3_sta Cycles which a STA Uop is dispatched on port 3
0x40 port_4 Cycles which a Uop is dispatched on port 4
0x80 port_5 Cycles which a Uop is dispatched on port 5
0xc port_2 Uops disptached to port 2, loads and stores (speculative and retired)
0x30 port_3 Uops disptached to port 3, loads and stores (speculative and retired)
0xc port_2_core Uops disptached to port 2, loads and stores per core (speculative and retired)
0x30 port_3_core Uops disptached to port 3, loads and stores per core (speculative and retired)
name:resource_stalls type:bitmask default:0x1
0x1 any Cycles Allocation is stalled due to Resource Related reason.
0x2 lb Cycles Allocator is stalled due to Load Buffer full
0x4 rs Stall due to no eligible Reservation Station (RS) entry available.
0x8 sb Cycles Allocator is stalled due to Store Buffer full (not including draining from synch).
0x10 rob ROB full cycles.
0xe mem_rs Resource stalls due to LB, SB or Reservation Station (RS) being completely in use
0xf0 ooo_rsrc Resource stalls due to Rob being full, FCSW, MXCSR and OTHER
0xa lb_sb Resource stalls due to load or store buffers
name:dsb2mite_switches type:bitmask default:0x1
0x1 count Number of Decode Stream Buffer (DSB) to MITE switches
0x2 penalty_cycles Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles.
name:dsb_fill type:bitmask default:0x2
0x2 other_cancel Count number of times a valid DSB fill has been actually cancelled for any reason.
0x8 exceed_dsb_lines Decode Stream Buffer (DSB) Fill encountered > 3 Decode Stream Buffer (DSB) lines.
0xa all_cancel Count number of times a valid Decode Stream Buffer (DSB) fill has been actually cancelled for any reason.
name:offcore_requests type:bitmask default:0x1
0x1 demand_data_rd Demand Data Read requests sent to uncore
0x2 demand_code_rd Offcore Code read requests. Includes Cacheable and Un-cacheables.
0x4 demand_rfo Offcore Demand RFOs. Includes regular RFO, Locks, ItoM.
0x8 all_data_rd Offcore Demand and prefetch data reads returned to the core.
name:uops_dispatched type:bitmask default:0x1
0x1 thread Counts total number of uops to be dispatched per-thread each cycle.
0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatced to be executed on this thread.
0x2 core Counts total number of uops dispatched from any thread
name:tlb_flush type:bitmask default:0x1
0x1 dtlb_thread Count number of DTLB flushes of thread-specific entries.
0x20 stlb_any Count number of any STLB flushes
name:l1d_blocks type:bitmask default:0x1
0x1 ld_bank_conflict Any dispatched loads cancelled due to DCU bank conflict
0x5 extra:cmask=1 bank_conflict_cycles Cycles with l1d blocks due to bank conflicts
name:other_assists type:bitmask default:0x2
0x2 itlb_miss_retired Instructions that experienced an ITLB miss. Non Pebs
0x10 avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable Non Pebs
0x20 sse_to_avx Number of transitions from legacy SSE to AVX-256 when penalty applicable Non Pebs
name:uops_retired type:bitmask default:0x1
0x1 all All uops that actually retired.
0x2 retire_slots number of retirement slots used non PEBS
0x1 extra:cmask=1,inv stall_cycles Cycles no executable uops retired
0x1 extra:cmask=10,inv total_cycles Number of cycles using always true condition applied to non PEBS uops retired event.
name:machine_clears type:bitmask default:0x2
0x2 memory_ordering Number of Memory Ordering Machine Clears detected.
0x4 smc Number of Self-modifying code (SMC) Machine Clears detected.
0x20 maskmov Number of AVX masked mov Machine Clears detected.
name:br_inst_retired type:bitmask default:0x1
0x1 conditional Counts all taken and not taken macro conditional branch instructions.
0x2 near_call Counts all macro direct and indirect near calls. non PEBS
0x8 near_return This event counts the number of near ret instructions retired.
0x10 not_taken Counts all not taken macro branch instructions retired.
0x20 near_taken Counts the number of near branch taken instructions retired.
0x40 far_branch Counts the number of far branch instructions retired.
0x4 all_branches_ps Counts all taken and not taken macro branches including far branches.(Precise Event)
0x2 near_call_r3 Ring123 only near calls (non precise)
0x2 near_call_r3_ps Ring123 only near calls (precise event)
name:br_misp_retired type:bitmask default:0x1
0x1 conditional All mispredicted macro conditional branch instructions.
0x2 near_call All macro direct and indirect near calls
0x10 not_taken number of branch instructions retired that were mispredicted and not-taken.
0x20 taken number of branch instructions retired that were mispredicted and taken.
0x4 all_branches_ps all macro branches (Precise Event)
name:fp_assist type:bitmask default:0x1e
0x1e extra:cmask=1 any Counts any FP_ASSIST umask was incrementing.
0x2 x87_output output - Numeric Overflow, Numeric Underflow, Inexact Result
0x4 x87_input input - Invalid Operation, Denormal Operand, SNaN Operand
0x8 simd_output Any output SSE* FP Assist - Numeric Overflow, Numeric Underflow.
0x10 simd_input Any input SSE* FP Assist
name:mem_uops_retired type:bitmask default:0x11
0x11 stlb_miss_loads STLB misses dues to retired loads
0x12 stlb_miss_stores STLB misses dues to retired stores
0x21 lock_loads Locked retired loads
0x41 split_loads Retired loads causing cacheline splits
0x42 split_stores Retired stores causing cacheline splits
0x81 all_loads Any retired loads
0x82 all_stores Any retired stores
name:mem_load_uops_retired type:bitmask default:0x1
0x1 l1_hit Load hit in nearest-level (L1D) cache
0x2 l2_hit Load hit in mid-level (L2) cache
0x4 llc_hit Load hit in last-level (L3) cache with no snoop needed
0x40 hit_lfb A load missed L1D but hit the Fill Buffer
name:mem_load_uops_llc_hit_retired type:bitmask default:0x1
0x1 xsnp_miss Load LLC Hit and a cross-core Snoop missed in on-pkg core cache
0x2 xsnp_hit Load LLC Hit and a cross-core Snoop hits in on-pkg core cache
0x4 xsnp_hitm Load had HitM Response from a core on same socket (shared LLC).
0x8 xsnp_none Load hit in last-level (L3) cache with no snoop needed.
name:l2_trans type:bitmask default:0x80
0x80 all_requests Transactions accessing L2 pipe
0x1 demand_data_rd Demand Data Read requests that access L2 cache, includes L1D prefetches.
0x2 rfo RFO requests that access L2 cache
0x4 code_rd L2 cache accesses when fetching instructions including L1D code prefetches
0x8 all_pf L2 or LLC HW prefetches that access L2 cache
0x10 l1d_wb L1D writebacks that access L2 cache
0x20 l2_fill L2 fill requests that access L2 cache
0x40 l2_wb L2 writebacks that access L2 cache
name:l2_lines_in type:bitmask default:0x7
0x7 all L2 cache lines filling L2
0x1 i L2 cache lines in I state filling L2
0x2 s L2 cache lines in S state filling L2
0x4 e L2 cache lines in E state filling L2
name:l2_lines_out type:bitmask default:0x1
0x1 demand_clean Clean line evicted by a demand
0x2 demand_dirty Dirty line evicted by a demand
0x4 pf_clean Clean line evicted by an L2 Prefetch
0x8 pf_dirty Dirty line evicted by an L2 Prefetch
0xa dirty_all Any Dirty line evicted