Finding CPU cycles with PMU registers

Anubhav Sharma anubhav at cse.iitb.ac.in
Thu Apr 14 11:47:10 EDT 2016


Hello,

I'm trying to learn how the Performance Monitoring Unit (PMU) works in 
Intel Core x86 systems. For example, I want to create a kernel module 
that counts the number of CPU cycles of a process, the PID of which is 
provided as a parameter. The same thing can be done by using perf:

perf stat -e cycles ./testProgram

The module init function has the following program flow:

1. I register a nmi handler function 'pmc_handler' to detect when a 
counter overflows.
     apic_write(APIC_LVTPC, APIC_DM_NMI);
     register_nmi_handler(NMI_LOCAL, pmc_handler, 0, "perf_handler");

2. For every cpu, I write 0x0 to various MSRs.
     for_each_online_cpu(cpu)
     {
         wrmsr64_safe_on_cpu(cpu, IA32_PERF_GLOBAL_CTRL, 0x0);
         wrmsr64_safe_on_cpu(cpu, IA32_PERF_GLOBAL_OVF_CTRL, 0x0);
         wrmsr64_safe_on_cpu(cpu, IA32_PERF_FIXED_CTR_CTRL, 0x0);

         wrmsr64_safe_on_cpu(cpu, IA32_PMC0, 0x0);
         wrmsr64_safe_on_cpu(cpu, IA32_PERFEVTSEL0, 0x0);
     }

3. For every online cpu:
     a. Set the 0th and 62nd bit of IA32_PERF_GLOBAL_OVF_CTRL to set 
overflow bit.
         wrmsr64_safe_on_cpu(cpu, IA32_PERF_GLOBAL_OVF_CTRL, (0x1 << 0) 
| ((u64) 0x1 << 62));

     b. Program IA32_PERFEVTSEL0 such that it measures unhalted CPU 
cycles with interrupts and overflows
        enabled in user and os mode.
         wrmsr64_safe_on_cpu(cpu, IA32_PERFEVTSEL0, (u64) INST_UNHALTED 
| INT_ENABLE | COUNTER_ENABLE | USR_MODE | OS_MODE);

     c. Write (u64)-999 to the IA32_PMC0 counter.
         wrmsr64_safe_on_cpu(cpu, IA32_PMC0, counterVal);
        I'm not sure how step 3c works. Probably, it is done to make 
sure that the counter does overflow more often. I could really use some
         explanation on this part.

Since the module is supposed to count CPU cycles of a particular 
process, we need to know when the process is running and when it is 
waiting.
To get that information, I have added a hook in the __schedule() 
function in kernel/sched/core.c. To know when a process terminates, I've 
added
a hook in do_exit() function of kernel/exit.c.

Now whenever our process is scheduled to a CPU, I enable the PMU 
counters by writing 0x1 to IA32_PERF_GLOBAL_CTRL on every cpu. Similarly,
whenever our process is scheduled out, I disable the PMU counters by 
writing 0x0 to IA32_PERF_GLOBAL_CTRL. This setup allows to count CPU 
cycles of
only the process we are interested in.

At this point we have a system where the counters will be enabled when 
our process is running on a CPU. Also, whenever the counter overflows, 
the pmc_handler
function is called (because that's what we did in Step 1).

In the pmc_handler function, I do the following:
     1. Disable the counters temporarily before reading from them.
         wrmsr64_safe_on_cpu(cpu, IA32_PERF_GLOBAL_CTRL, 0x0);
     2. Increase total count by the value of the counter.
         totalCount += read_msrs_on_cpu(cpu, IA32_PMC0);
     3. Enable counters again.
         wrmsr64_safe_on_cpu(cpu, IA32_PERF_GLOBAL_CTRL, 0x1);


When the process terminates, i.e. when the exit hook is called for our 
process, I output the value of totalCount.

This setup seems correct to me but the value of totalCount and the value 
what perf gives me are vastly different. (56,000 [my module] v/s 700,000 
[perf])

(For perf I'm reading r003c which is the same as INST_UNHALTED(0x003c) 
as per Intel Core performance manual. )

Can anyone please help me with this? I've been stuck at this for quite 
some time. I suspect there could be a conceptual flaw in the whole setup.

Thanks and regards,
Anubhav Sharma
(Kernel Novice)





More information about the Kernelnewbies mailing list