<div dir="ltr"><span style="font-family:arial,sans-serif;font-size:13px">I've been trying to identify why we're seeing frequent stalls during packet transmission in our GPFS cluster in the bnx2 driver (as well as other NICs/drivers), but I am at the limit of my current knowledge. I used perf netdev events (as described in </span><a href="http://lwn.net/Articles/397654/" target="_blank" style="font-family:arial,sans-serif;font-size:13px">http://lwn.net/Articles/397654/</a><span style="font-family:arial,sans-serif;font-size:13px">) to measure the tx times, and see spikes such as the following:</span><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px"><font face="courier new, monospace"> dev len Qdisc netdevice free<br></font></div><div style="font-family:arial,sans-serif;font-size:13px">
<div><font face="courier new, monospace"> em2 98 807740.878085sec 0.002msec 0.061msec</font></div><div><font face="courier new, monospace"> em2 98 807740.878119sec 0.002msec 0.029msec</font></div>
<div><font face="courier new, monospace"> em2 98 807741.140600sec 0.005msec 0.092msec</font></div><div><font face="courier new, monospace"> em2 65226 807742.763833sec 0.007msec 0.436msec</font></div>
<div><font face="courier new, monospace"> em2 66 807727.081712sec 0.001msec 16246.072msec</font></div><div><font face="courier new, monospace"> em2 66 807740.882741sec 0.001msec 3457.625msec</font></div>
</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">Based on the source for netdev-times.py, the "free" column is the difference between trace_net_dev_xmit() and trace_kfree_skb() in net/core/dev.c, but I'm not sure how to dig any deeper. Are there any common causes for this behavior? What's the best way to further break down the time difference between the xmit and kfree trace points?</div>
</div>