what is __get_cpu_var() ?

Wed Feb 23 13:26:22 EST 2011

Hi Dave,

On Wed, Feb 23, 2011 at 11:15 AM, Dave Hylands <dhylands at gmail.com> wrote:
> Hi Murali,
>
> On Wed, Feb 23, 2011 at 10:34 AM, Murali N <nalajala.murali at gmail.com> wrote:
>> Hi Dave,
>> thanks for your reply.
> ...snip...
>>> get_cpu_var returns the contents of a per-cpu variable.
>>>
>>> __get_cpu_var contains the actual machine-dependant implementation. It
>>> looks like all of the architectures use the one in
>>> asm-generic/percpu.h
>>>
>>> In general, all of the per-cpu data is gathered together into a
>>> section. Multiple sections are allocated (one per CPU). I think that
>>> the address of the variable is really the offset within the section,
>>> and each allocated section is cache-line aligned. This offset is then
>>> added to the "offset for my cpu" to come up with the final address of
>>> the variable, which is dereferenced as a pointer dereference. There
>>> are lots of extra doo-dads to get around warnings, and to prevent the
>>> linker from producing relocation references for for the variable
>>> access (since it looks like an access of a global variable, but it's
>>> really just doing a game of using the offset of the variable within
>>> the section).
>>>
>>> So you could think of it as a very fancy offsetof macro.
>>>
>>> There are several other macros involved, perhaps you could be a bit
>>> more specific about your request?
>>>
>>> Dave Hylands
>>>
>>
>> I have one more basic question.
>> Why would we need to maintain structures like this? Is there any
>> advantage we get here?
>
> Primarily for performance reasons. For example, the kernel maintains
> lots of stats on threads and processes (I haven't looked to see if
> these are actually maintained on a per-cpu basis, but the concept
> applies). these stats are updated frequently, but only accessed
> occaisonally. If you have a global "database" of stats, then each CPU
> needs to lock the data, which creates lots of contention. By keeping
> stuff per-cpu, the cpus don't need to acquire any locks (or at the
> very least won't cause as much contention when acquiring per-cpu
> locks). This becomes especially important when there are lots of cpus.
>
> The query functions can then amalgamate the information and present it
> as if it were maintained in a global database.
>
> So if you have data which is updated frequently and only accessed
> occaisonally, or updated infrequently and accessed frequently, then
> you might have a case for using per-cpu-data. Of course you'd still
> need to profile it and see if it makes sense.
>
> Also keep in mind, that some things might not seem like it matters
> much for say a dual-core, but could make a considerable difference
> with say 32 cores.
>
> Dave Hylands
>

So it make sense to use if i am running on more cores ( > 4 ).

-- 
Regards,
Murali N