what is __get_cpu_var() ?

Dave Hylands dhylands at gmail.com
Wed Feb 23 13:30:15 EST 2011


HI Murali,

On Wed, Feb 23, 2011 at 11:26 AM, Murali N <nalajala.murali at gmail.com> wrote:
> Hi Dave,
>
> On Wed, Feb 23, 2011 at 11:15 AM, Dave Hylands <dhylands at gmail.com> wrote:
>> Hi Murali,
>>
>> On Wed, Feb 23, 2011 at 10:34 AM, Murali N <nalajala.murali at gmail.com> wrote:
>>> Hi Dave,
>>> thanks for your reply.
>> ...snip...
>>>> get_cpu_var returns the contents of a per-cpu variable.
>>>>
>>>> __get_cpu_var contains the actual machine-dependant implementation. It
>>>> looks like all of the architectures use the one in
>>>> asm-generic/percpu.h
>>>>
>>>> In general, all of the per-cpu data is gathered together into a
>>>> section. Multiple sections are allocated (one per CPU). I think that
>>>> the address of the variable is really the offset within the section,
>>>> and each allocated section is cache-line aligned. This offset is then
>>>> added to the "offset for my cpu" to come up with the final address of
>>>> the variable, which is dereferenced as a pointer dereference. There
>>>> are lots of extra doo-dads to get around warnings, and to prevent the
>>>> linker from producing relocation references for for the variable
>>>> access (since it looks like an access of a global variable, but it's
>>>> really just doing a game of using the offset of the variable within
>>>> the section).
>>>>
>>>> So you could think of it as a very fancy offsetof macro.
>>>>
>>>> There are several other macros involved, perhaps you could be a bit
>>>> more specific about your request?
>>>>
>>>> Dave Hylands
>>>>
>>>
>>> I have one more basic question.
>>> Why would we need to maintain structures like this? Is there any
>>> advantage we get here?
>>
>> Primarily for performance reasons. For example, the kernel maintains
>> lots of stats on threads and processes (I haven't looked to see if
>> these are actually maintained on a per-cpu basis, but the concept
>> applies). these stats are updated frequently, but only accessed
>> occaisonally. If you have a global "database" of stats, then each CPU
>> needs to lock the data, which creates lots of contention. By keeping
>> stuff per-cpu, the cpus don't need to acquire any locks (or at the
>> very least won't cause as much contention when acquiring per-cpu
>> locks). This becomes especially important when there are lots of cpus.
>>
>> The query functions can then amalgamate the information and present it
>> as if it were maintained in a global database.
>>
>> So if you have data which is updated frequently and only accessed
>> occaisonally, or updated infrequently and accessed frequently, then
>> you might have a case for using per-cpu-data. Of course you'd still
>> need to profile it and see if it makes sense.
>>
>> Also keep in mind, that some things might not seem like it matters
>> much for say a dual-core, but could make a considerable difference
>> with say 32 cores.
>>
>> Dave Hylands
>>
>
> So it make sense to use if i am running on more cores ( > 4 ).

It really depends on the access patterns of the data. Whether it makes
sense or not is something you'll probably need to profile (i.e. with
and without using per-cpu variables).

Dave Hylands



More information about the Kernelnewbies mailing list