perf_event wakeup_events = 0

Theodore Dubois tbodt at google.com
Sat Sep 7 19:27:39 EDT 2019


On Sep 7, 2019, at 3:45 PM, Valdis Klētnieks <valdis.kletnieks at vt.edu> wrote:

> So an entry is made in the buffer. It's not clear that this immediately triggers
> a signal…

I think the documentation says it does when wakeup_events is 1. The code for
perf backs this up:
https://github.com/torvalds/linux/blob/a9815a4fa2fd297cab9fa7a12161b16657290293/tools/perf/util/evsel.c#L1051-L1054
The puzzle is what happens when wakeup_events is 0. The documentation saying
"more recent kernels treat 0 the same as 1" suggests it should behave the same,
but then why would perf set it to 1 after zero-initializing it?

> So you need to look at what size mmap buffer is being allocated.  It's *probably*
> on the order of megabytes, so that you can buffer a fairly large number of entries
> and not take several user/kernel transitions on every single entry…

It’s 512 KiB. Each sample is 40 bytes (the sample_type is IP | TID | TIME |
PERIOD, and each one of those 8 bytes). 40 bytes per sample * 4000 samples per
second * 1.637 seconds is 261920 which is almost exactly half the buffer.

So does wakeup_events = 0 means it causes a wakeup when the buffer is half
full? I don't see anything in the man page about this....

If you'd like to try yourself, this is the strace command I've been using:
strace -ttTv -eperf_event_open,mmap,poll -operf.strace perf record stress --cpu 1 --timeout 1

~Theodore

> 
> On Sat, 07 Sep 2019 09:14:49 -0700, Theodore Dubois said:
> 
> Reading what it actually says rather than what I thought it said.. :)
> 
>       Events come in two flavors: counting and sampled.  A counting event  is
>       one  that  is  used  for  counting  the aggregate number of events that
>       occur.  In general, counting event results are gathered with a  read(2)
>       call.   A  sampling  event periodically writes measurements to a buffer
>       that can then be accessed via mmap(2).
> 
> For some reason, I was thinking counting events.  -ENOCAFFEINE. :)
> 
>> sample_freq is 4000 (and freq is 1). Here’s the man page on this field:
>> 
>>       sample_period, sample_freq
>>              A "sampling" event is one that generates an  overflow  notifica‐
>>              tion  every N events, where N is given by sample_period.  A sam‐
>>              pling event has sample_period > 0.
> 
> There's this part:
>>              pling event has sample_period > 0.   When  an  overflow  occurs,
>>              requested  data is recorded in the mmap buffer.  The sample_type
>>              field controls what data is recorded on each overflow.
> 
> So an entry is made in the buffer. It's not clear that this immediately triggers
> a signal...
> 
>   MMAP layout
>       When using perf_event_open() in sampled mode, asynchronous events (like
>       counter overflow or PROT_EXEC mmap tracking) are logged  into  a  ring-
>       buffer.  This ring-buffer is created and accessed through mmap(2).
> 
>       The mmap size should be 1+2^n pages, where the first page is a metadata
>       page (struct perf_event_mmap_page) that contains various bits of infor?
>       mation such as where the ring-buffer head is.
> 
> So you need to look at what size mmap buffer is being allocated.  It's *probably*
> on the order of megabytes, so that you can buffer a fairly large number of entries
> and not take several user/kernel transitions on every single entry...
> 
>> If I’m reading this right, this is a sampling event which overflows 4000 times a second.
> 
> And 4,000 entries are made in the buffer per second..
> 
>> But perf then does a poll call which wakes up on this FD with POLLIN after
>> 1.637 seconds, instead of 0.00025 seconds
> 
> At which point perf goes and looks at several thousand entries in the ring buffer...




More information about the Kernelnewbies mailing list