what is __get_cpu_var() ?

Dave Hylands dhylands at gmail.com
Wed Feb 23 13:15:58 EST 2011


Hi Murali,

On Wed, Feb 23, 2011 at 10:34 AM, Murali N <nalajala.murali at gmail.com> wrote:
> Hi Dave,
> thanks for your reply.
...snip...
>> get_cpu_var returns the contents of a per-cpu variable.
>>
>> __get_cpu_var contains the actual machine-dependant implementation. It
>> looks like all of the architectures use the one in
>> asm-generic/percpu.h
>>
>> In general, all of the per-cpu data is gathered together into a
>> section. Multiple sections are allocated (one per CPU). I think that
>> the address of the variable is really the offset within the section,
>> and each allocated section is cache-line aligned. This offset is then
>> added to the "offset for my cpu" to come up with the final address of
>> the variable, which is dereferenced as a pointer dereference. There
>> are lots of extra doo-dads to get around warnings, and to prevent the
>> linker from producing relocation references for for the variable
>> access (since it looks like an access of a global variable, but it's
>> really just doing a game of using the offset of the variable within
>> the section).
>>
>> So you could think of it as a very fancy offsetof macro.
>>
>> There are several other macros involved, perhaps you could be a bit
>> more specific about your request?
>>
>> Dave Hylands
>>
>
> I have one more basic question.
> Why would we need to maintain structures like this? Is there any
> advantage we get here?

Primarily for performance reasons. For example, the kernel maintains
lots of stats on threads and processes (I haven't looked to see if
these are actually maintained on a per-cpu basis, but the concept
applies). these stats are updated frequently, but only accessed
occaisonally. If you have a global "database" of stats, then each CPU
needs to lock the data, which creates lots of contention. By keeping
stuff per-cpu, the cpus don't need to acquire any locks (or at the
very least won't cause as much contention when acquiring per-cpu
locks). This becomes especially important when there are lots of cpus.

The query functions can then amalgamate the information and present it
as if it were maintained in a global database.

So if you have data which is updated frequently and only accessed
occaisonally, or updated infrequently and accessed frequently, then
you might have a case for using per-cpu-data. Of course you'd still
need to profile it and see if it makes sense.

Also keep in mind, that some things might not seem like it matters
much for say a dual-core, but could make a considerable difference
with say 32 cores.

Dave Hylands



More information about the Kernelnewbies mailing list