Linux do_coredump() and SMP systems
Sudharsan Vijayaraghavan
sudvijayr at gmail.com
Tue Feb 17 08:41:55 EST 2015
Hi All,
We are running 3.8 kernel.
I have a unique scenario, where we hit on several issues in do_coredump.
We have a SMP system with thousands of cores, one pthread is tied to
one core. The main process containing these pthreads runs in the first
core.
Here is the issue # 1
When one of threads core dump, we enter into do_coredump(), now one
other thread in same process running in a different
core can as well core dump(before SIGKILL was delivered to it as a
consequence of first core dump)
This gives way to entering into do_coredump more than once.
Once we have two guys entering do_coredump() one can kill other with SIGKILL
the result is completely unpredictable. No guarantee we will have two
core files generated in the end
Linux kernel does not seem to handle it at all.
Adding a spin lock within do_coredump() will solve the case of
multiple entries into do_coredump()
I want to know whether Linux kernel really does not handle the above
case or am I missing something?
Please clarify
Issue # 2:
Within do_coredump() SIGKILL is sent to all threads in process other
than the one running core dump.
There is no guarantee that SIGKILL will be immediately received by all
threads in the process, which means the state of threads (particularly
backtrace per thread) can be lot of different now when compared to the
time at which offending thread initiated a coredump.
This is in turn means the core dump generated, will have a backtrace
per thread, which is not accurate
Please confirm my understanding, advice on how this problem can be solved
Thanks,
Sudharsan
More information about the Kernelnewbies
mailing list