getconf CLK_TCK and CONFIG_HZ

Wed Mar 16 01:46:43 EDT 2011

On Sun, Mar 13, 2011 at 1:20 AM, Mulyadi Santosa
<mulyadi.santosa at gmail.com> wrote:
> Dear Jim...
>
> Allow me to help you by sharing what I know so far....

thanks Mulyadi

>> This agrees with sysconf granularity :
>> $ getconf CLK_TCK
>> 100
>> but not with linux kernel HZ:
>> $ grep _HZ /boot/config-`uname -r`
>> CONFIG_HZ_1000=y
>> CONFIG_HZ=1000
>
> Not a surprise. I read somewhere that Linux ABI so far declares that
> time clock granularity is always shown as 100 HZ or 10 ms. So it is
> more due to "following the rule" and application in user space follow
> this assumption, even though we know that most distros now shift into
> 250 or 1000 HZ

<grumble>

I cant help but be annoyed that my distro kernel is already built with HZ_1000,
but im stuck with 10ms resolution.  I did note that theres no setconf
to go along with getconf.

I note however that CLOCKS_PER_SEC seems to be what youre referring to
as the fixed number.  That its 10k faster than 10ms is probably unimportant,
its a fixed ratio, and ripe for hard-coding :-(

       clock ticks - _SC_CLK_TCK
              The number of clock ticks per second.  The  corresponding  vari‐
              able  is obsolete.  It was of course called CLK_TCK.  (Note: the
              macro CLOCKS_PER_SEC does not give information:  it  must  equal
              1000000.)

Its also frustrating to read about the obsolete variable, and not know why it
was obsoleted.

Having getconf is a 1st step I suppose; its possible in theory
to call sysconf(_SC_CLK_TCK) and then adjust your notion of how long 1
clock_t is,
perhaps in the next decade this will happen in all apps,
and then this ABI constraint can be relaxed.

I presume the reason for HZ_1000 is RT-ish responsiveness ?
that and the ever increasing clock rates, more is done in 1ms now
than used to take 10ms :-)

>> Why doesnt times() also count IO-wait states for a process (and children) ?
>

just to clarify, I meant count it separately - it may be aggregated in there.
But I guess you read into that...

> AFAIK, if you look closely into timing accounting and the way I/O code
> path behaves, you shall see that most of the times I/O is done in
> async style...and more over, you can't tell for sure that is it only
> single I/O code path that runs or more than one that are currently
> running (by either interleaving between them or executing them
> simultaneously).
>
> That's why IMO, it's quite hard (but still possible) to account for
> per process I/O wait...but still quite easy to do it system widely.
>

system calls are executed in kernel, but have process context, right ?
meaning theres a PID to assign the costs to.

I can understand that at some depth, the async nature means that
those costs are no longer trackable to a pid, perhaps forex the block-layer
work to service a file read...

Does 1000_HZ mean that user jobs are scheduled out on 1ms boundaries ?
They obviously can be scheduled out earlier if they hit a sleep / IOwait,
and interrupts push user space (and other kernel tasks too) out.

>> Could process specific IO-wait numbers reveal anything about cache
>> performance ?
>
> I personally think,the main perfomance indicator of cache perfomance
> is cache hit utilization.

yeah, that makes sense.  With the new perf tools,
there are powerful ways to look at those numbers.

>
>> Do cache misses contribute to IO-wait, or do they get counted in other ways
>> ?
>
> we're talking about page cache, right?
>
> then sure, cache misses would trigger hard/major page fault, assuming
> it is file backed page.

I wasnt really making the distinction - 3 kinds:

1 major page fault - page refetched from HD, 10s of ms wait,
   process sleeps, and gets scheduled out.
   fault is counted by hardware,
   and is available for measurement by perf,
   and accountable to process, but not counted by default.

2 minor page fault - page is in mem (perhaps shared)
   its added to faulting process' address space, page tables.
   How long does this typically take ?
   Does it result in a reschedule, context switch ?

3 L1..L3 cache miss.
   IIRC, this adds 10-20x cycles for each level out from registers.
   probably far less than context switch.
   kernel could perhaps switch to another thread in same process. does it ?

Ive noticed how effective git pull is at pegging the cpu.
Ive often wondered why its so good at it -
whether it can be attributed to anything in particular.
Like less cache-misses, or maybe less system-calls,
and therefore more productive cpu cycles.

I guess I should look at System Monitor 2.32.0 code,
see how it measures and graphs these different things.
It seems to provide info that times() does not.

Maybe I'll run "perf stat -- git pull"  next time,
of course thats not entirely repeatable/comparable to anyone elses run.

>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>

thanks again,
Jim