using prefetch

Kevin Wilson wkevils at gmail.com
Fri Feb 15 15:18:43 EST 2013


Thanks!
KW


On Fri, Feb 15, 2013 at 6:42 PM,
<michi1 at michaelblizek.twilightparadox.com> wrote:
> Hi!
>
> On 12:16 Fri 15 Feb     , Kevin Wilson wrote:
> ...
>> AFAIK, what prefetch does is get a variable from memory and put it in
>> cache (L2 cache I believe).
>
> Yes, this is true. See:
> http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Other-Builtins.html
> I am not so sure about the cache level it is fetched to.
>
>> Is the prefetch operation synchronous ? I mean, after calling it, are
>> we gauranteed that the variable is
>> indeed in the cache ?
>
> No, the variable definitely not guaranteed to be in the cache. This would not
> make any sense. The purpose of the prefetch is to fetch data in background
> while executing something else.
>
> Actually it is not guaranteed to fetch anything at all. The target cpu might
> not support the feature at all. Even if it does there are cases where it will
> not be prefetched, e.g. when it triggers a page fault. Also the cpu itself
> might decide not to do the prefetch, e.g. when the cache line is present (and
> locked by cache coherency) in the cache of a different cpu/core.
>
>> So this is probably for improving performance, assuming that you will
>> need this variable in the near
>> future.
>> The comment there says:
>> /* prefetch skb_end_pointer() to speedup skb_shinfo(skb) */
>>
>> According to this logic, anywhere that we want to call skb_shinfo(skb)
>> we better do a prefetch before.
>>
>> In fact, if we prefetch any variable that we want to use then we end up
>> with performance boost.
>>
>> So - any hints, what are the guidlines for using prefetch()?
>
> You really should *not* prefetch() all variables you want to use. Prefetch
> itself generates code which needs cpu cycles. It can quickly make your program
> slower. Use it only in places where
> - the data is very unlikely to be in the cache of either the current or any
>   other cpu in the system *and*
> - you can add the prefetch instruction at least 100ns before the actual use
>
> Also, if you access a reasonably large memory array sequentially (either
> forward or backward), you should not use prefetch() at all. The cpus have
> hardware prefetchers which are faster in this case.
>
>
> A general advise for performance optimisation: run benchmarks
>
>         -Michi
> --
> programing a layer 3+4 network protocol for mesh networks
> see http://michaelblizek.twilightparadox.com
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies



More information about the Kernelnewbies mailing list