[Scheduler] CFS - What happens to each task's slice if nr_running * min_granularity > sched_latency?

Thu Apr 2 22:10:11 EDT 2020

Hi everyone ~

I've read a few different references on CFS and have been looking through
fair.c at the CFS scheduler code. One thing I'm not completely
understanding is what happens when too many processes are running such
that nr_running
* min_granularity > sched_latency

I know that the scheduler will expand the period so that each process can
run for at least the min_granularity, *but how does that interact with nice
numbers*? Here's the code for expanding the period:

/*
 * The idea is to set a period in which each task runs once.
 *
 * When there are too many tasks (sched_nr_latency) we have to stretch
 * this period because otherwise the slices get too small.
 *
 * p = (nr <= nl) ? l : l*nr/nl
 */
static u64 __sched_period(unsigned long nr_running)
{
        if (unlikely(nr_running > sched_nr_latency))
                return nr_running * sysctl_sched_min_granularity;
        else
                return sysctl_sched_latency;
}

Here's the code for calculating an individual process's slice. It looks
like the weighting formula is used here regardless of whether the period
has been expanded.

   - If that's the case, doesn't that mean that some processes will still
   get a slice that's smaller than the min_granularity?

/*
 * We calculate the wall-time slice from the period by taking a part
 * proportional to the weight.
 *
 * s = p*P[w/rw]
 */
static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
        u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);

        for_each_sched_entity(se) {
                struct load_weight *load;
                struct load_weight lw;

                cfs_rq = cfs_rq_of(se);
                load = &cfs_rq->load;

                if (unlikely(!se->on_rq)) {
                        lw = cfs_rq->load;

                        update_load_add(&lw, se->load.weight);
                        load = &lw;
                }
                slice = __calc_delta(slice, se->load.weight, load);
        }
        return slice;
}

I ran a test by starting five busy processes with a nice level of -10.
Next, I launched ~40 busy processes with a nice level of 0 (all procs were
set to use the same CPU). I expected CFS to expand the period and assign
each process a slice equal to the min granularity. However, the 5 processes
with nice = -10 still used considerably more CPU than the other processes.

Is __calc_delta in the function above actually expanding the slice further
based on the nice weighting of the tasks? The __calc_delta function is a
bit difficult to follow, so I haven't quite figured out what it's doing.

tl;dr I know that CFS expands the period if lots of processes are running.
What I'm not sure about is how nice levels affect the slice each task gets
if the period has been expanded due to a high number of running tasks.

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20200402/8fcf5497/attachment.html>