Linux do_coredump() and SMP systems

Sudharsan Vijayaraghavan sudvijayr at gmail.com
Thu Feb 19 07:00:26 EST 2015


Hi Greg,

There is plan to move to 3.14, right now the focus it to iron out
existing issues.
Now with regard to core dump issue, we find 10% of times we get struck in

coredump_wait():
   ==> wait_for_completion(&core_state->startup);
Analyzing exit_mm() to see what is going wrong here.

I have one other question, which am curious about,
In coredump_wait():

There is loop to wait for task inactive (no task is running on any core)
                ptr = core_state->dumper.next;
                while (ptr != NULL) {

                        pr_err("pid: %d %s() calling
wait_task_inactive() for pid : %d\n",
                               tsk->pid,__func__,ptr->task->pid);

                        wait_task_inactive(ptr->task, 0);

                        pr_err("pid: %d %s() wait_task_inactive()
returned for pid : %d\n",
                               tsk->pid,__func__,ptr->task->pid);

                        ptr = ptr->next;
                }

There is a delay between the crash  and actual generation of core dump
due to the above loop.
In a multicore system it is quite possible other threads of the same
process can run in other cores, as a consequence
the address space / program counter etc., can change

Given this coredump generated will not reflect the state of process
(various thread registers/mm) as it must have been at time of crash
(any thread/main process)
Is my understanding correct? Just probing on way to get rid of this discrepancy

Thanks,
Sudharsan

Thanks,
Sudharsan


On Wed, Feb 18, 2015 at 9:31 PM, Greg KH <greg at kroah.com> wrote:
> On Wed, Feb 18, 2015 at 11:44:32AM +0530, Sudharsan Vijayaraghavan wrote:
>> We are doing prototype so much change have gone into kernel , we are
>> finding it difficult to upgrade to latest immediately
>
> What changes are you making to the kernel that you are sticking with
> such an old version (3.8 is 2 years old now, and over 155 thousand
> changes have happened to the kernel since then)?

>> However I ran through the code once again, indeed kernel handles it
>> down_write(&mm->mmap_sem); in coredump_wait() makes sure the second
>> coredump is stopped and returns negative for core_waiters
>
> Great, so it works now?
>
> confused,
>
> greg k-h



More information about the Kernelnewbies mailing list