How to analyze kernel Oops dump
Peter Teoh
htmldeveloper at gmail.com
Wed Feb 6 00:24:50 EST 2013
perhaps let me try:
The cause of crash is here:
[ 493.113464] Unable to handle kernel paging request at virtual address
f6b9f777
[ 493.124298] pgd = ec4c4000
[ 493.127166] [f6b9f777] *pgd=00000000
ie, value of page directory at 0xec4c4000 is zero.
at the time of crash the set of register values are:
[ 493.169158] PC is at __kmalloc_track_caller+0xa4/0x1ec
[ 493.174591] LR is at 0x80569dc0
[ 493.177917] pc : [<801094d8>] lr : [<80569dc0>] psr: a0000113
[ 493.177947] sp : 80569dc0 ip : 89011b70 fp : 80569dfc
[ 493.190124] r10: 00001fea r9 : 00000001 r8 : 00000000
[ 493.195648] r7 : 00000940 r6 : 000000d1 r5 : ed002900 r4 : f6b9f777
[ 493.202575] r3 : 80568000 r2 : 00000000 r1 : 08aa8000 r0 : 80589c00
[ 493.209503] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment
kernel
[ 493.217254] Control: 10c5387d Table: ec4c406a DAC: 00000015
Take the same version of the kernel source, and u can see that line 3415
matches exactly the warning message in the error log:
size_t ksize(const void *object)
{
struct page *page;
if (unlikely(object == ZERO_SIZE_PTR))
return 0;
page = virt_to_head_page(object);
if (unlikely(!PageSlab(page))) {
WARN_ON(!PageCompound(page)); =========> this is line 3415
return PAGE_SIZE << compound_order(page);
}
return slab_ksize(page->slab);
}
EXPORT_SYMBOL(ksize); ======> exported symbols results in the kernel image
having "ksize" as the symbol near the crash point - which is located +0x70
from "ksize".
As for the reason the page's compound page attributes has not been set
correctly.....u have to read the history:
[ 494.068664] Backtrace:
[ 494.071289] [<80109434>] (__kmalloc_track_caller+0x0/0x1ec) from
[<80335ec0>] (__alloc_skb+0x60/0xfc)
[ 494.081085] [<80335e60>] (__alloc_skb+0x0/0xfc) from [<80336530>]
(__netdev_alloc_skb+0x2c/0x54)
[ 494.090423] [<80336504>] (__netdev_alloc_skb+0x0/0x54) from [<7f078788>]
(stmmac_poll+0x590/0x794 [stmmac])
[ 494.100738] r4:ed0b84c0 r3:00000000
[ 494.104553] [<7f0781f8>] (stmmac_poll+0x0/0x794 [stmmac]) from
[<8033f23c>] (net_rx_action+0x88/0x1f0)
[ 494.114440] [<8033f1b4>] (net_rx_action+0x0/0x1f0) from [<80045fb4>]
(__do_softirq+0x12c/0x260)
[ 494.123657] [<80045e88>] (__do_softirq+0x0/0x260) from [<8004659c>]
(irq_exit+0x58/0xb0)
[ 494.132263] [<80046544>] (irq_exit+0x0/0xb0) from [<8000fa08>]
(handle_IRQ+0x8c/0xc8)
[ 494.140563] r4:00000078 r3:0000020c
[ 494.144378] [<8000f97c>] (handle_IRQ+0x0/0xc8) from [<80008658>]
(gic_handle_irq+0x48/0x6c)
[ 494.153228] r5:80569f40 r4:fa212000
[ 494.157043] [<80008610>] (gic_handle_irq+0x0/0x6c) from [<8000e600>]
(__irq_svc+0x40/0x70)
[ 494.165802] Exception stack(0x80569f40 to 0x80569f88)
>From the above, I can only guess the possible calling sequence are as below:
In net/core/skbuff.c:
170 struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
171 int fclone, int node)
172 {
xxxxxx
200 size = SKB_WITH_OVERHEAD(ksize(data));
201 prefetchw(data + size);
202
notice the _alloc_skb()==>ksize(), which ended up with *pgd error above?
looked also a few functions below stmmac_poll() (as the offset 0x590 is
quite far away from stmmac_poll(), so it is unlikely to be this function
itself, as other subsequent function after this is declared with "static",
meaning that it does not have symbol, so disassembly-wise will still use
the "stmmac_poll" symbol. Seemed like descriptor related bug.
See this:
http://comments.gmane.org/gmane.linux.network/236183
whose version comes after 3.4.0, or 3.4.6 - to be specific:
http://lwn.net/Articles/507526/
--
Regards,
Peter Teoh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130206/02de8d67/attachment.html
More information about the Kernelnewbies
mailing list