Module vs Kernel main performacne
Abu Rasheda
rcpilot2010 at gmail.com
Tue May 29 19:50:35 EDT 2012
Hi,
I am working on x8_64 arch. Profiled (oprofile) Linux kernel module
and notice that whole lot of cycles are spent in copy_from_user call.
I compared same flow from kernel proper and noticed that for more data
through put cycles spent in copy_from_user are much less. Kernel
proper has 1/8 cycles compared to module. (There is a user process
which keeps sending data, like iperf)
Used perf tool to gather some statistics and found that call from kernel proper
185,719,857,837 cpu-cycles # 3.318 GHz
[90.01%]
99,886,030,243 instructions # 0.54 insns per cycle
[95.00%]
1,696,072,702 cache-references # 30.297 M/sec
[94.99%]
786,929,244 cache-misses # 46.397 % of all cache
refs [95.00%]
16,867,747,688 branch-instructions # 301.307 M/sec
[95.03%]
86,752,646 branch-misses # 0.51% of all branches
[95.00%]
5,482,768,332 bus-cycles # 97.938 M/sec
[20.08%]
55967.269801 cpu-clock
55981.842225 task-clock # 0.933 CPUs utilized
and call from kernel module
9,388,787,678 cpu-cycles # 1.527 GHz
[89.77%]
1,706,203,221 instructions # 0.18 insns per cycle
[94.59%]
551,010,961 cache-references # 89.588 M/sec [94.73%]
369,632,492 cache-misses # 67.083 % of all cache refs
[95.18%]
291,358,658 branch-instructions # 47.372 M/sec [94.68%]
10,291,678 branch-misses # 3.53% of all branches
[95.01%]
582,651,999 bus-cycles # 94.733 M/sec
[20.55%]
6112.471585 cpu-clock
6150.490210 task-clock # 0.102 CPUs utilized
367 page-faults # 0.000 M/sec
367 minor-faults # 0.000 M/sec
0 major-faults # 0.000 M/sec
25,770 context-switches # 0.004 M/sec
23 cpu-migrations # 0.000 M/sec
So obviously, CPU is stalling when it is copying data and there are
more cache misses. My question is, is there a difference calling
copy_from_user from kernel proper compared to calling from LKM ?
More information about the Kernelnewbies
mailing list