How to measure performance inside Kernel?

Peter Senna Tschudin peter.senna at gmail.com
Thu Feb 9 07:58:21 EST 2012


Dear list,

I'm looking for a way to compare the performance of two different
codes inside Kernel. I was able to do some comparison on user land but
I want to test the specific portion of code inside Kernel.

At line 1195 of drivers/media/video/videobuf2-core.c:
/*
 * Reinitialize all buffers for next use.
 */
for (i = 0; i < q->num_buffers; ++i)
       q->bufs[i]->state = VB2_BUF_STATE_DEQUEUED;

With:

/* buf2 */
/*
 * Reinitialize all buffers for next use.
 */
buf_ptr_end = q->bufs[q->num_buffers];

for (buf_ptr = q->bufs[0]; buf_ptr < buf_ptr_end; ++buf_ptr)
       buf_ptr->state = VB2_BUF_STATE_DEQUEUED;

To test on user land I've created two separate C source codes and
compiled with gcc -O2, then used the "perf" tool on the entire
application. With num_buffers = 131072:

$ perf stat -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-references,cache-misses
-r 2048 ./buf1

Performance counter stats for './buf1' (2048 runs):

16,538,039 cycles                #0.000 GHz                  (+-0.06%)[80.23%]
6,917,411 stalled-cycles-frontend#41.83% frontend cycles idle(+-0.14%)[80.25%]
4,686,384 stalled-cycles-backend #28.34% backend  cycles idle(+-0.14%)[80.28%]
148,990 cache-references                                     (+-0.38%)[80.24%]
71,180 cache-misses              #47.775 % of all cache refs (+-0.22%)[88.14%]

0.005234340 seconds time elapsed

$ perf stat -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-references,cache-misses
-r 2048 ./buf2
Performance counter stats for './buf2' (2048 runs):

14,740,563 cycles                #0.000 GHz                  (+-0.04%)[77.89%]
5,187,716 stalled-cycles-frontend#35.19% frontend cycles idle(+-0.14%)[77.81%]
3,383,748 stalled-cycles-backend #
101,894 cache-references                                     (+-0.23%)[84.60%]
66,647 cache-misses              #65.408 % of all cache refs (+-0.14%)[90.52%]

0.004661826 seconds time elapsed                             (+-0.06%)

But I want to repeat the tests on specific portion of code, not on
entire application. Is there a safe way of do something like:

start_bench ( ?? ); /* start measurement */

buf_ptr_end = q->bufs[q->num_buffers];

for (buf_ptr = q->bufs[0]; buf_ptr < buf_ptr_end; ++buf_ptr)
       buf_ptr->state = VB2_BUF_STATE_DEQUEUED;

end_bench ( ?? ); /* end measurement */

And is this the correct approach for testing the performance of
specific portion of Kernel code?

Thank you!

Peter



-- 
Peter Senna Tschudin
peter.senna at gmail.com
gpg id: 48274C36



More information about the Kernelnewbies mailing list