using prefetch
michi1 at michaelblizek.twilightparadox.com
michi1 at michaelblizek.twilightparadox.com
Fri Feb 15 11:42:25 EST 2013
Hi!
On 12:16 Fri 15 Feb , Kevin Wilson wrote:
...
> AFAIK, what prefetch does is get a variable from memory and put it in
> cache (L2 cache I believe).
Yes, this is true. See:
http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Other-Builtins.html
I am not so sure about the cache level it is fetched to.
> Is the prefetch operation synchronous ? I mean, after calling it, are
> we gauranteed that the variable is
> indeed in the cache ?
No, the variable definitely not guaranteed to be in the cache. This would not
make any sense. The purpose of the prefetch is to fetch data in background
while executing something else.
Actually it is not guaranteed to fetch anything at all. The target cpu might
not support the feature at all. Even if it does there are cases where it will
not be prefetched, e.g. when it triggers a page fault. Also the cpu itself
might decide not to do the prefetch, e.g. when the cache line is present (and
locked by cache coherency) in the cache of a different cpu/core.
> So this is probably for improving performance, assuming that you will
> need this variable in the near
> future.
> The comment there says:
> /* prefetch skb_end_pointer() to speedup skb_shinfo(skb) */
>
> According to this logic, anywhere that we want to call skb_shinfo(skb)
> we better do a prefetch before.
>
> In fact, if we prefetch any variable that we want to use then we end up
> with performance boost.
>
> So - any hints, what are the guidlines for using prefetch()?
You really should *not* prefetch() all variables you want to use. Prefetch
itself generates code which needs cpu cycles. It can quickly make your program
slower. Use it only in places where
- the data is very unlikely to be in the cache of either the current or any
other cpu in the system *and*
- you can add the prefetch instruction at least 100ns before the actual use
Also, if you access a reasonably large memory array sequentially (either
forward or backward), you should not use prefetch() at all. The cpus have
hardware prefetchers which are faster in this case.
A general advise for performance optimisation: run benchmarks
-Michi
--
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com
More information about the Kernelnewbies
mailing list