how to use perf record effectively
jim.cromie at gmail.com
jim.cromie at gmail.com
Fri Nov 10 14:11:14 EST 2023
I have a working patchset which de-duplicates the pr_debug
per-callsite ( module, filename, function ) data.
it loads that column data into 3 maple-trees,
and simple accessor fns retrieve the data
by lookup with the pr-debug address.
So it stores these callsites:
[ 0.721980] dyndbg: 3653 prdebugs in 307 modules, 19 KiB in ddebug
tables, 114 kiB ..
into these intervals:
[ 104.047210] dyndbg: mt-funcs has 2174 entries
[ 104.047816] dyndbg: mt-files has 539 entries
[ 104.048410] dyndbg: mt-mods has 312 entries
once these are loaded, the __dyndbg_sites section,
which separates the 3 columns from the __dyndbg section,
can be recycled.contains
ALL GOOD SO FAR.
BUT WHATS THE RUNTIME COST OF THIS ?
perf stat -r200 cat /proc/dynamic_debug/control > /dev/null;
this should be a good test - it calls all 3 accessors for each
pr-debug in the kernel.
but comparing master against this branch shows little change,
and adding --table to see the variations in the runs
suggests that the change is less than the variation within a test.
MASTER - v6.6
bash-5.2# perf stat -r 200 cat /proc/dynamic_debug/control > /dev/null
Performance counter stats for 'cat /proc/dynamic_debug/control' (200 runs):
10.29 msec task-clock # 0.713
CPUs utilized ( +- 0.56% )
43 context-switches # 4.177
K/sec ( +- 0.03% )
1 cpu-migrations # 97.142
/sec ( +- 5.80% )
73 page-faults # 7.091
K/sec ( +- 0.10% )
8906200 cycles # 0.865
GHz ( +- 0.17% )
147349 stalled-cycles-frontend # 1.65%
frontend cycles idle ( +- 0.18% )
24971 stalled-cycles-backend # 0.28%
backend cycles idle ( +- 8.18% )
20589718 instructions # 2.31
insn per cycle
# 0.01 stalled
cycles per insn ( +- 0.02% )
5470202 branches # 531.388
M/sec ( +- 0.01% )
0 branch-misses
0.0144421 +- 0.0000647 seconds time elapsed ( +- 0.45% )
DE_DUPLICATION branch
bash-5.2# perf stat -r200 cat /proc/dynamic_debug/control > /dev/null
Performance counter stats for 'cat /proc/dynamic_debug/control' (200 runs):
21.89 msec task-clock # 0.622
CPUs utilized ( +- 0.69% )
44 context-switches # 2.010
K/sec ( +- 0.12% )
1 cpu-migrations # 45.693
/sec ( +- 3.87% )
73 page-faults # 3.336
K/sec ( +- 0.10% )
52017542 cycles # 2.377
GHz ( +- 0.54% )
177875 stalled-cycles-frontend # 0.34%
frontend cycles idle ( +- 0.48% )
134469 stalled-cycles-backend # 0.26%
backend cycles idle ( +- 4.24% )
134707837 instructions # 2.59
insn per cycle
# 0.00 stalled
cycles per insn ( +- 0.30% )
39386555 branches # 1.800
G/sec ( +- 0.29% )
0 branch-misses
0.035188 +- 0.000167 seconds time elapsed ( +- 0.47% )
I tried perf stat record, then perf-diff on the results,
it showed empty comparisons on a handful of event-types
[jimc at frodo boots-dump]$ perf diff -v
v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0*
v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675*
v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906* > foo
group desc not available
pmu capabilities not available
group desc not available
pmu capabilities not available
group desc not available
pmu capabilities not available
group desc not available
pmu capabilities not available
group desc not available
pmu capabilities not available
group desc not available
pmu capabilities not available
group desc not available
pmu capabilities not available
[jimc at frodo boots-dump]$ more foo
# Event 'task-clock'
#
# Data files:
# [0] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0 (Baseline)
# [1] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0-null
# [2] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675
# [3] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675-null
# [4] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906
# [5] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906-null
#
# Baseline/0 Delta Abs/1 Delta Abs/2 Delta Abs/3 Delta Abs/4
Delta Abs/5 Shared Object Symbol
# .......... ........... ........... ........... ...........
........... ............. ......
#
# Event 'context-switches'
#
# Data files:
# [0] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0 (Baseline)
# [1] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0-null
# [2] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675
# [3] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675-null
# [4] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906
# [5] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906-null
#
# Baseline/0 Delta Abs/1 Delta Abs/2 Delta Abs/3 Delta Abs/4
Delta Abs/5 Shared Object Symbol
# .......... ........... ........... ........... ...........
........... ............. ......
#
# Event 'cpu-migrations'
#
# Data files:
# [0] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0 (Baseline)
# [1] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0-null
# [2] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675
# [3] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675-null
# [4] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906
# [5] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906-null
#
# Baseline/0 Delta Abs/1 Delta Abs/2 Delta Abs/3 Delta Abs/4
Delta Abs/5 Shared Object Symbol
# .......... ........... ........... ........... ...........
........... ............. ......
#
# Event 'page-faults'
#
# Data files:
# [0] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0 (Baseline)
# [1] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0-null
# [2] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675
# [3] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675-null
# [4] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906
# [5] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906-null
#
# Baseline/0 Delta Abs/1 Delta Abs/2 Delta Abs/3 Delta Abs/4
Delta Abs/5 Shared Object Symbol
# .......... ........... ........... ........... ...........
........... ............. ......
#
# Event 'cycles'
#
# Data files:
# [0] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0 (Baseline)
# [1] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0-null
# [2] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675
# [3] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675-null
# [4] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906
# [5] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906-null
#
# Baseline/0 Delta Abs/1 Delta Abs/2 Delta Abs/3 Delta Abs/4
Delta Abs/5 Shared Object Symbol
# .......... ........... ........... ........... ...........
........... ............. ......
#
# Event 'stalled-cycles-frontend'
#
# Data files:
# [0] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0 (Baseline)
# [1] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0-null
# [2] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675
# [3] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675-null
# [4] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906
# [5] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906-null
#
# Baseline/0 Delta Abs/1 Delta Abs/2 Delta Abs/3 Delta Abs/4
Delta Abs/5 Shared Object Symbol
# .......... ........... ........... ........... ...........
........... ............. ......
#
# Event 'stalled-cycles-backend'
#
# Data files:
# [0] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0 (Baseline)
# [1] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0-null
# [2] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675
# [3] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675-null
# [4] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906
# [5] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906-null
#
# Baseline/0 Delta Abs/1 Delta Abs/2 Delta Abs/3 Delta Abs/4
Delta Abs/5 Shared Object Symbol
# .......... ........... ........... ........... ...........
........... ............. ......
#
# Event 'instructions'
#
# Data files:
# [0] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0 (Baseline)
# [1] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0-null
# [2] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675
# [3] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675-null
# [4] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906
# [5] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906-null
#
# Baseline/0 Delta Abs/1 Delta Abs/2 Delta Abs/3 Delta Abs/4
Delta Abs/5 Shared Object Symbol
# .......... ........... ........... ........... ...........
........... ............. ......
#
# Event 'branches'
#
# Data files:
# [0] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0 (Baseline)
# [1] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0-null
# [2] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675
# [3] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675-null
# [4] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906
# [5] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906-null
#
# Baseline/0 Delta Abs/1 Delta Abs/2 Delta Abs/3 Delta Abs/4
Delta Abs/5 Shared Object Symbol
# .......... ........... ........... ........... ...........
........... ............. ......
#
# Event 'branch-misses'
#
# Data files:
# [0] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0 (Baseline)
# [1] v6.6-36-g73a29cb216c0/perf-rec-6.6.0-f2-00036-g73a29cb216c0-null
# [2] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675
# [3] v6.6-8-g067d1f1d8675/perf-rec-6.6.0-tf2-00008-g067d1f1d8675-null
# [4] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906
# [5] v6.6-23-g10f95252e906/perf-rec-6.6.0-tf2-00023-g10f95252e906-null
#
# Baseline/0 Delta Abs/1 Delta Abs/2 Delta Abs/3 Delta Abs/4
Delta Abs/5 Shared Object Symbol
# .......... ........... ........... ........... ...........
........... ............. ......
#
Does anyone here have enough experience with perf to recommend
some tests to tease out the differences ?
More information about the Kernelnewbies
mailing list