Run queue corruption issue

Greg KH greg at kroah.com
Wed May 18 02:16:05 EDT 2016


On Wed, May 18, 2016 at 01:29:55AM -0400, Jerrin Shaji George wrote:
> Hi Greg,
> 
> Thanks for your response.
> 
> On Tue, May 17, 2016 at 7:20 PM, Greg KH <greg at kroah.com> wrote:
> > On Tue, May 17, 2016 at 06:55:07PM -0400, Jerrin Shaji George wrote:
> >> Hi All,
> >>
> >> I wanted help with a piece of code that I have been working on.
> >>
> >> Please see -
> >>
> >> https://gist.github.com/jerrinsg/333e584d1f65dc95b9f13b61dcebdaa7
> >>
> >> I have written two function, migrate_to and migrate_back. migrate_to is used
> >> to remove a process from the run queue, and migrate_back is used to insert this
> >> process back into the run queue.
> >>
> >> The gist is from a taken from a larger project, where we are working on building
> >> a mechanism to support thread migration across heterogeneous processors.
> >> migrate_to_call() will be called by a thread which wants to remove itself from
> >> the run queue (hence, it will pass the current task struct as the migration
> >> argument). Once the other processor completes execution of the assigned task, it
> >> will interrupt the main processor, which runs an interrupt handler, which in
> >> turn calls the migrate_back_call() function. It passes the task struct of the
> >> process that was removed from the run queue earlier to this function.
> >>
> >> This mechanism works fine the first few times, but when this process is repeated
> >> many times in a loop, I am seeing a run queue corruption:
> >> https://gist.github.com/jerrinsg/0ab09cd435d8d2cb6ae692c7e6f4f26b
> >>
> >> Is there anything wrong in the process dequeue or enqueue function that I have
> >> written? Please help!
> >
> > volatile doesn't mean what you think it does, please don't use it in the
> > kernel.
> >
> 
> This flag was to be used for synchronization. I will change this.
> 
> > And why are you using "raw_spin_lock()"?
> 
> I used this seeing other usage in sched/core.c. Can please you let me know if I
> should instead use a different function to lock the run queue?

Ah, don't know, don't mess with the scheduler, thankfully :)

> >> Kernel used: Linux 3.13
> >
> > Wow that's obsolete and buggy, why use such an old thing?
> 
> This is the codebase that I inherited. Once I get the basic prototype working, I
> will be working to port it to a newer version of the kernel.

Try porting it to a modern kernel and then posting your real patch for
review, that would make things a bit more obvious and probably show your
bug better.

good luck,

greg k-h



More information about the Kernelnewbies mailing list