Adding a sched_class after the removal of ".next" regarding priority

Paulo Miguel Almeida paulo.miguel.almeida.rodenas at gmail.com
Thu May 20 20:25:07 EDT 2021


On Fri, May 21, 2021 at 12:05:27AM +0200, J Mårtensson wrote:
> Hi,
> I have been trying to add a new scheduler to the Linux kernel. I have
> found that to add a sched_class, I need to add it to SCHED_DATA in
> vmlinux.lds.h. instead of editing the now removed .next variable.
I'm assuming that you are referring to this patch from Steven Rostedt, right? 
https://lore.kernel.org/lkml/20191219214451.340746474@goodmis.org/

> Depending on what order I put into the priority list, it will crash
> the kernel during the booting process after rebooting. Any tips on
> what could be causing this would be appreciated!
I am not sure if I that would help, but if I were you I would try to 
isolate the problem with available debugging mechanisms available in the
kernel. Have you tried compiling the kernel with CONFIG_SCHED_DEBUG=y and make use of Early Print K ?

https://www.kernel.org/doc/html/latest/x86/earlyprintk.html

Since it's rebooting due to the error, you won't be able the see the
logs... so if you enable early printk and get those messages across a
piece of hardware that's not rebooting then at least you will be able to
read those messages and add/rem statements to help you figure out what's
going wrong.

there is no silver-bullet solution for that, but I'm sure that you will
have a lot of fun trying to debug this. Once you find the solution,
please share with us. I'm sure this will be benefitial for future
developers with similar questions.

> 
> Currently this works
> 
> #define SCHED_DATA              \
>     STRUCT_ALIGN();             \
>     __begin_sched_classes = .;      \
>     *(__idle_sched_class)           \
>     *(__my_sched_class)         \
>     *(__fair_sched_class)           \
>     *(__rt_sched_class)         \
>     *(__dl_sched_class)         \
>     *(__stop_sched_class)           \
>     __end_sched_classes = .;
> 

This most likely works because during the OS booting, all processes
executed have their scheduling needs sorted out from the dl_sched_class to the 
fair_sched_class. So either there is no moment when the CPU is idle
throughout the process (unlikely) or the bug on your _my_sched_class
isn't triggered when there is nothing in the CPU run queue.

I can be wrong though, so if anyone has a better explanation, please
chime in.

> While this does not
> 
> #define SCHED_DATA              \
>     STRUCT_ALIGN();             \
>     __begin_sched_classes = .;      \
>     *(__idle_sched_class)           \
>     *(__fair_sched_class)           \
>     *(__my_sched_class)         \
>     *(__rt_sched_class)         \
>     *(__dl_sched_class)         \
>     *(__stop_sched_class)           \
>     __end_sched_classes = .;
> 
It's hard to speculate about the reason why it's failing but if I was a
gambling man I would say that *given the fact* that __my_sched_class has a higher-priority than the
__fair_sched_class, it breaks when trying to execute the __my_sched_class methods defined in the DEFINE_SCHED_CLASS macro. 

Example from the fair.c sched class: https://github.com/torvalds/linux/blob/02dbb7246c5bbbbe1607ebdc546ba5c454a664b1/kernel/sched/fair.c#L11261-L11304

Paulo Miguel Almeida

> 
> Regards
> Jacob
> 
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies



More information about the Kernelnewbies mailing list