How to analyze kernel Oops dump

Peter Teoh htmldeveloper at gmail.com
Wed Feb 6 00:24:50 EST 2013


perhaps let me try:

The cause of crash is here:

[  493.113464] Unable to handle kernel paging request at virtual address
f6b9f777
[  493.124298] pgd = ec4c4000
[  493.127166] [f6b9f777] *pgd=00000000

ie, value of page directory at 0xec4c4000 is zero.

at the time of crash the set of register values are:

[  493.169158] PC is at __kmalloc_track_caller+0xa4/0x1ec
[  493.174591] LR is at 0x80569dc0
[  493.177917] pc : [<801094d8>]    lr : [<80569dc0>]    psr: a0000113
[  493.177947] sp : 80569dc0  ip : 89011b70  fp : 80569dfc
[  493.190124] r10: 00001fea  r9 : 00000001  r8 : 00000000
[  493.195648] r7 : 00000940  r6 : 000000d1  r5 : ed002900  r4 : f6b9f777
[  493.202575] r3 : 80568000  r2 : 00000000  r1 : 08aa8000  r0 : 80589c00
[  493.209503] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM Segment
kernel
[  493.217254] Control: 10c5387d  Table: ec4c406a  DAC: 00000015

Take the same version of the kernel source, and u can see that line 3415
matches exactly the warning message in the error log:

size_t ksize(const void *object)
{
        struct page *page;

        if (unlikely(object == ZERO_SIZE_PTR))
                return 0;

        page = virt_to_head_page(object);

        if (unlikely(!PageSlab(page))) {
                WARN_ON(!PageCompound(page));  =========> this is line 3415
                return PAGE_SIZE << compound_order(page);
        }

        return slab_ksize(page->slab);
}
EXPORT_SYMBOL(ksize); ======> exported symbols results in the kernel image
having "ksize" as the symbol near the crash point - which is located +0x70
from "ksize".

As for the reason the page's compound page attributes has not been set
correctly.....u have to read the history:

[  494.068664] Backtrace:
[  494.071289] [<80109434>] (__kmalloc_track_caller+0x0/0x1ec) from
[<80335ec0>] (__alloc_skb+0x60/0xfc)
[  494.081085] [<80335e60>] (__alloc_skb+0x0/0xfc) from [<80336530>]
(__netdev_alloc_skb+0x2c/0x54)
[  494.090423] [<80336504>] (__netdev_alloc_skb+0x0/0x54) from [<7f078788>]
(stmmac_poll+0x590/0x794 [stmmac])
[  494.100738]  r4:ed0b84c0 r3:00000000
[  494.104553] [<7f0781f8>] (stmmac_poll+0x0/0x794 [stmmac]) from
[<8033f23c>] (net_rx_action+0x88/0x1f0)
[  494.114440] [<8033f1b4>] (net_rx_action+0x0/0x1f0) from [<80045fb4>]
(__do_softirq+0x12c/0x260)
[  494.123657] [<80045e88>] (__do_softirq+0x0/0x260) from [<8004659c>]
(irq_exit+0x58/0xb0)
[  494.132263] [<80046544>] (irq_exit+0x0/0xb0) from [<8000fa08>]
(handle_IRQ+0x8c/0xc8)
[  494.140563]  r4:00000078 r3:0000020c
[  494.144378] [<8000f97c>] (handle_IRQ+0x0/0xc8) from [<80008658>]
(gic_handle_irq+0x48/0x6c)
[  494.153228]  r5:80569f40 r4:fa212000
[  494.157043] [<80008610>] (gic_handle_irq+0x0/0x6c) from [<8000e600>]
(__irq_svc+0x40/0x70)
[  494.165802] Exception stack(0x80569f40 to 0x80569f88)

>From the above, I can only guess the possible calling sequence are as below:

In net/core/skbuff.c:

 170 struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 171                             int fclone, int node)
 172 {
xxxxxx
 200         size = SKB_WITH_OVERHEAD(ksize(data));
 201         prefetchw(data + size);
 202

notice the _alloc_skb()==>ksize(), which ended up with *pgd error above?

looked also a few functions below stmmac_poll() (as the offset 0x590 is
quite far away from stmmac_poll(), so it is unlikely to be this function
itself, as other subsequent function after this is declared with "static",
meaning that it does not have symbol, so disassembly-wise will still use
the "stmmac_poll" symbol.   Seemed like descriptor related bug.

See this:

http://comments.gmane.org/gmane.linux.network/236183

whose version comes after 3.4.0, or 3.4.6 - to be specific:

http://lwn.net/Articles/507526/

-- 
Regards,
Peter Teoh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130206/02de8d67/attachment.html 


More information about the Kernelnewbies mailing list