Current and correct CPU clock and asm("cpuid")

Peter Senna Tschudin peter.senna at gmail.com
Mon Oct 3 14:57:27 EDT 2011


Hi Peter,

Thanks for the repply. I've realized that I have no need to transform
the arbitrary number in something like seconds because I'm interested
in comparing them.

Is it safe to say that if I do not make the division by
CPU_THOUSAND_HZ I have the number of clock cycles that were "spent"
between the calls to getticks()(including some for getticks() itself)?

Please see below.

Thank you!

Peter

On Mon, Oct 3, 2011 at 1:17 PM, Peter Teoh <htmldeveloper at gmail.com> wrote:
> why not u put a sleep(1) here like this:
>
>>        ticks tickBegin, tickEnd;
>>        tickBegin = getticks();
>>
>
> sleep(1);
>
>>
>>        tickEnd = getticks();
>>        double time = (tickEnd-tickBegin)/CPU_THOUSAND_HZ;
>>
> Then u know that it is reading the TSC values for 1 sec.   And by
> running the same program on different system u will get different
> "time" values, and then u divide by that values for THAT system - so
> that eventually running the same program on different system will get
> u the same difference of ticks, which in our present case is "1".
> After this "normalization", you can run your system with any timing
> difference, and maximum achievable resolution is of course 1 sec.   Is
> that what u wanted?

That sounds as great idea but:
 - may dynamic clock rate and multiple CPU cores mess with your proposal?
 - How precise is sleep about sleeping for 1 second?
 - I hope that the out of order execution mechanism of the CPU gets
frustrated with your proposal and runs the instructions in the order
we're expecting (tickBegin-> sleep-> tickEnd). How can we be sure that
the instructions were run in correct order?

>
> BTW, modern OS does not use TSC any more, but yes, your assembly can
> still access and read TSC.   But the OS usually read from HPET (which
> is how sleep(1) calculate the time differences) and to read the HPET
> here is a link:
>
> http://www.fftw.org/cycle.h

Looking cycle.h I found this familiar code(starts on line 216):

/*----------------------------------------------------------------*/
/*
 * X86-64 cycle counter
 */

static __inline__ ticks getticks(void)
{
     unsigned a, d;
     asm volatile("rdtsc" : "=a" (a), "=d" (d));
     return ((ticks)a) | (((ticks)d) << 32);
}

The code found on cycle.h is so similar to the one I was using that I
guess that both codes were written by the same author. I got the code
I'm using from the paper at:
http://people.virginia.edu/~chg5w/page3/assets/MeasuringUnix.pdf

>
> And query the OS via:
>
> cat /sys/devices/system/clocksource/clocksource0/*
> hpet acpi_pm
> hpet
>
> and u can see from above that "tsc" is missing from my system.
> (linux kernel is 2.6.35-22)
>
> For TSC, I am not sure what is the highest resolution u can go, but in
> a modern SoC chip, with 600Mhz core speed (speaking of PowerPC
> http://en.wikipedia.org/wiki/PowerPC_e500), the fastest execution is
> 600 millions instruction per sec, assuming the instruction is one insn
> per clock.   With this kind of speed, TSC is a very bad for measuring
> time differences.

This is my mistake. I did not told you about my tests will run only on x86 arch.

>
> On Mon, Oct 3, 2011 at 9:27 AM, Peter Senna Tschudin
> <peter.senna at gmail.com> wrote:
>> Dear list members,
>>
>> I'm following:
>>
>> http://people.virginia.edu/~chg5w/page3/assets/MeasuringUnix.pdf
>>
>> And I'm trying to measure executing time of simple operations with RDTSC.
>>
>> See the code below:
>>
>> #include <stdio.h>
>> #define CPU_THOUSAND_HZ 800000
>> typedef unsigned long long ticks;
>> static __inline__ ticks getticks(void) {
>>        unsigned a, d;
>>        asm("cpuid");
>>        asm volatile("rdtsc" : "=a" (a), "=d" (d));
>>        return (((ticks)a) | (((ticks)d) << 32));
>> }
>>
>> void main() {
>>        ticks tickBegin, tickEnd;
>>        tickBegin = getticks();
>>
>>        // code to time
>>
>>        tickEnd = getticks();
>>        double time = (tickEnd-tickBegin)/CPU_THOUSAND_HZ;
>>
>>        printf ("%Le\n", time);
>> }
>>
>> How can the C code detects the correct value for CPU_THOUSAND_HZ? The
>> problems I see are:
>>  - It is needed to collect the information for the CPU that will run
>> the process. On Core i7 processors, different cores can run at
>> different clock speed at same time.
>>  - If the clock changes during the execution of process, what should
>> it do? When is the best time for collecting the clock speed?
>>
>> The authors of the paper are not sure about the effects of
>> "asm("cpuid");" Does it ensure that the entire process will run on the
>> same CPU, and will serialize it avoiding out of order execution by the
>> CPU?
>>
>> Thank you very much! :-)
>>
>> Peter
>>
>>
>> --
>> Peter Senna Tschudin
>> peter.senna at gmail.com
>> gpg id: 48274C36
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies at kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>
>
>
>
> --
> Regards,
> Peter Teoh
>



-- 
Peter Senna Tschudin
peter.senna at gmail.com
gpg id: 48274C36



More information about the Kernelnewbies mailing list