How to measure performance inside Kernel?
Peter Senna Tschudin
peter.senna at gmail.com
Thu Feb 9 07:58:21 EST 2012
Dear list,
I'm looking for a way to compare the performance of two different
codes inside Kernel. I was able to do some comparison on user land but
I want to test the specific portion of code inside Kernel.
At line 1195 of drivers/media/video/videobuf2-core.c:
/*
* Reinitialize all buffers for next use.
*/
for (i = 0; i < q->num_buffers; ++i)
q->bufs[i]->state = VB2_BUF_STATE_DEQUEUED;
With:
/* buf2 */
/*
* Reinitialize all buffers for next use.
*/
buf_ptr_end = q->bufs[q->num_buffers];
for (buf_ptr = q->bufs[0]; buf_ptr < buf_ptr_end; ++buf_ptr)
buf_ptr->state = VB2_BUF_STATE_DEQUEUED;
To test on user land I've created two separate C source codes and
compiled with gcc -O2, then used the "perf" tool on the entire
application. With num_buffers = 131072:
$ perf stat -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-references,cache-misses
-r 2048 ./buf1
Performance counter stats for './buf1' (2048 runs):
16,538,039 cycles #0.000 GHz (+-0.06%)[80.23%]
6,917,411 stalled-cycles-frontend#41.83% frontend cycles idle(+-0.14%)[80.25%]
4,686,384 stalled-cycles-backend #28.34% backend cycles idle(+-0.14%)[80.28%]
148,990 cache-references (+-0.38%)[80.24%]
71,180 cache-misses #47.775 % of all cache refs (+-0.22%)[88.14%]
0.005234340 seconds time elapsed
$ perf stat -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-references,cache-misses
-r 2048 ./buf2
Performance counter stats for './buf2' (2048 runs):
14,740,563 cycles #0.000 GHz (+-0.04%)[77.89%]
5,187,716 stalled-cycles-frontend#35.19% frontend cycles idle(+-0.14%)[77.81%]
3,383,748 stalled-cycles-backend #
101,894 cache-references (+-0.23%)[84.60%]
66,647 cache-misses #65.408 % of all cache refs (+-0.14%)[90.52%]
0.004661826 seconds time elapsed (+-0.06%)
But I want to repeat the tests on specific portion of code, not on
entire application. Is there a safe way of do something like:
start_bench ( ?? ); /* start measurement */
buf_ptr_end = q->bufs[q->num_buffers];
for (buf_ptr = q->bufs[0]; buf_ptr < buf_ptr_end; ++buf_ptr)
buf_ptr->state = VB2_BUF_STATE_DEQUEUED;
end_bench ( ?? ); /* end measurement */
And is this the correct approach for testing the performance of
specific portion of Kernel code?
Thank you!
Peter
--
Peter Senna Tschudin
peter.senna at gmail.com
gpg id: 48274C36
More information about the Kernelnewbies
mailing list