Determining patch impact on a specific config

Nicholas Mc Guire der.herr at hofr.at
Wed Aug 17 12:50:30 EDT 2016


On Wed, Aug 17, 2016 at 05:39:27PM +0200, Greg KH wrote:
> On Wed, Aug 17, 2016 at 02:49:22PM +0000, Nicholas Mc Guire wrote:
> > On Wed, Aug 17, 2016 at 04:17:19PM +0200, Greg KH wrote:
> > > On Wed, Aug 17, 2016 at 02:01:28PM +0000, Nicholas Mc Guire wrote:
> > > > On Wed, Aug 17, 2016 at 03:52:16PM +0200, Greg KH wrote:
> > > > > On Wed, Aug 17, 2016 at 03:25:44PM +0200, Greg KH wrote:
> > > > > > On Wed, Aug 17, 2016 at 12:39:39PM +0000, Nicholas Mc Guire wrote:
> > > > > > > 
> > > > > > > Hi !
> > > > > > > 
> > > > > > >  For a given patch I would like to find out if it impacts a
> > > > > > >  given configuration or not. Now of course one could compile the
> > > > > > >  kernel for the configuration prior to the patch, then apply the
> > > > > > >  patch and recompile to find out if there is an impact but I would
> > > > > > >  be looking for some smarter solution. Checking files only 
> > > > > > >  unfortunately will not do it, due to ifdefs and friends so make
> > > > > > >  would detect a change and recompile even if the affeted code 
> > > > > > >  area is actualy dropped by the preprocessor.
> > > > > > > 
> > > > > > >  What Im trying to do is find out is, how many of the e.g. stable
> > > > > > >  fixes of 4.4-4.4.14 would have impacted a given configuration - the
> > > > > > >  whole exercise is intended for some statistical analysis of bugs
> > > > > > >  in linux-stable.
> > > > > 
> > > > > Also, are you going to be analyizing the bugs in the stable trees, or
> > > > > the ones we just happen to fix?
> > > > > 
> > > > > Note, that's not always the same thing :)
> > > > >
> > > > what we have been looking at first is the stable fixes
> > > > for which the bug-commit is known via Fixes: patch. That only
> > > > a first approximation but correlates very good with the
> > > > overall stable fix rates. And from the regression analysis
> > > > of the stable fix rates over versions one then can exstimate the
> > > > residual bugs if one knows the distribution of the bug 
> > > > survival times - which one again can estimate based on the
> > > > bug-fixes that have Fixes: tags. 
> > > 
> > > That is all relying on the Fixes: tags, which are not used evenly across
> > > the kernel at all.  Heck, there are still major subsystems that NEVER
> > > mark a single patch for the stable trees, let alone adding Fixes: tags.
> > > Same thing goes for most cpu architectures.
> > 
> > Well for the config we studied it was not that bad
> > 
> > 4.4 - 4.4.13 stable bug-fix commits 
> >          total   with    % with
> >          fix     Fixes:  Fixes
> >          commits tag     tag in
> >          1643    589     subsys
> > kernel   3.89%   4.75%   43.7%
> > mm       1.82%   2.17%   53.3%
> > block    0.36%   0.84%   83.3%!
> > fs       8.76%   4.92%*  20.1%*
> > net      9.31%   12.56%  48.3%
> > drivers  47.96%  49.23%  36.8%
> > include  6.87%   19.18%  28.3%*
> > arch/x86 4.50%   12.56%  33.7%
> >  (Note that the precentages here do not add up
> >   to 100% because we just picked out x86 and did not 
> >   include all subsystems e.g. lib is missing).
> > 
> >  So fs is significantly below and include a bit - block is 
> >  hard to say simply because it was only 6 stable fixes of 
> >  which 5 had Fixes: tags so that sample is too small.
> >  Correlating overall stable-fixes distribution over sublevels
> >  with stabel-fixes with Fixes: tag gives me an R^2 of 0.76
> >  so that does show that for any trending using Fixes: tags
> >  is resonable. As noted we are looking at statistic properties
> >  to come up with expected values nothing more.
> 
> But you aren't comparing that to the number of changes that are
> happening in a "real" release.  If you do that, you will see the
> subsystems that never mark things for stable, which you totally miss
> here, right?

we are not looking at the runup to 4.4 here we are looking at
the fixes that go into 4.4.1++ and for those we look at all
commits in linux-stable. so that should cover ALL subsystems 
for which bugs were discovered and fixed (either in 4.4.X or
ported from other 4.X findings).

> 
> For example, where are the driver subsystems that everyone relies on
> that are changing upstream, yet have no stable fixes?  What about the
> filesystems that even more people rely on, yet have no stable fixes?
> Are those code bases just so good and solid that there are no bugs to be
> fixed?  (hint, no...)

that is not what we are claiming - the model here is that the 
operation is uncovering bugs and the critical bugs are being 
fixed in stable releases. That there are more fixes and lots 
of cleanups that go into stable is clear but with respect to 
the usability of the kernel we do assume that if a bug in 
driver X is found that results in this driver being unusable 
or destabilizing the kernel it would be fixed in the stable 
fixes as well (which is also visible in the close to 50% 
fixes being in drivers) - now if that assumption is overly 
naive then you are right - and the assessment will not hold

> 
> So because of that, you can't use the information about what I apply to
> stable trees as an indication that those are the only parts of the
> kernel that have bugs to be fixed.

so a discovered critical bug found in 4.7 that also is found
to apply to say 4.4.14 would *not* be fixed in 4.4.15 stable 
release ? 

> 
> > > So be careful about what you are trying to measure, it might just be not
> > > what you are assuming it is...
> > 
> > A R^2 of 0.76 does indicate that the commits with Fixes: tags in 4.4 series
> > is quite well representing the overall stable fixes. 
> 
> "overall stable fixes".  Not "overall kernel fixes", two very different
> things, please don't confuse the two.

Im not - we are looking at stable fixes - not kernel fixes the 
reason for that simply being that for kernel fixes it is not
possible to say if they are bug-fixes or optimzations/enhancements
- atleast not in any automated way.

The focus on stable dot releases and their fixes was chosen 
 * because it is manageable
 * because we assume that critical bugs discovered will be fixed
 * and because there are no optimizations or added features 

> 
> And because of that, I would state that "overall stable fixes" number
> really doesn't mean much to a user of the kernel.

It does for those that are using some LTS release and it says 
something about the probability of a bug in a stable relase
being detected. Or would you say that a 4.4.13 is not to be
expected to be better off than 4.4.1 ? From the data we have
looked at so far: life-time of a bug in -stable as well as with 
respect to the discovery rate of bugs in sublevel releases
it seems clear that the reliability of the kernel over
sublevel releases is increasing and that this can be utilized
to select a kernelversion more suitable for HA or critical
systems based on trending/analysis.

> 
> > > > I dont know yet how robust these models will be at the end
> > > > but from what we have until now I do think we can come up
> > > > with quite sound predictions for the residual faults in the
> > > > kernel.
> > > 
> > > Based on what I know about how stable patches are picked and applied, I
> > > think you will find it is totally incorrect.  But hey, what do I know?
> > > :)
> > 
> > Well if I look at the overall stable fixes developlment - not just those
> > with Fixes: tags I get very clear trends if we look at at stable fixes
> > over sublevels (linear model using gamma-distribution)
> > 
> > ver  intercept slope      p-value DoF AIC
> > 3.2  4.2233783 0.0059133  < 2-16  79  2714.8
> > 3.4  3.9778258 -0.0005657 0.164 * 110 4488
> > 3.10 4.3841885 -0.0085419 < 2-16  98  2147.1
> > 3.12 4.7146752 -0.0014718 0.0413  58  1696.9
> > 3.14 4.6159638 -0.0131122 < 2-16  70  2124.8
> > 3.18 4.671178  -0.006517  7.34-5  34  1881.2
> > 4.1  4.649701  -0.004211  0.09    25  1231.8
> > 4.4  5.049331  -0.039307  7.69-11 12  571.48
> > 
> > So while the confidence levels of some (notable 3.4) is not
> > that exciting the overall trend does look resonably establshied
> > that the slop is turning negative - indicating that the
> > number of stable-fixes of sublevels systematically decreases
> > with sub-lvels, which does indicate a stable development process.
> 
> I don't understand.  Not everyone uses "fixes:" so you really can't
> use that as an indication of anything.  I know I never do for any patch
> that I write.

This is not using Fixes: this is over all stable sublevel relase
fix-commits - so the overall number of commits in the sublevel
releases is systematically going down with sublevels (sublevels
them selves being of course a covoluted parameter representing
testing/field-usage/review/etc.)

> 
> Over time, more people are using the "fixes:" tag, but then that messes
> with your numbers because you can't compare the work we did this year
> with the work we did last year.

sur why not ? You must look at relative usage and correlation
of the tags - currently about 36% of the stable commits in the
dot-releases (sublevels) are a uable basis - if the use of
Fixes: increases all the better - it just means we are moving
towards an R^2 of 1 - results stay comparable, it just means
that the confidence intervals for the current data are wider
than for the data of next year.

> 
> Also, our rate of change has increased, and the number of stable patches
> being tagged has increased, based on me going around and kicking
> maintainers.  Again, because of that you can't compare year to year at
> all.

why not ? We are not selecting a specific class of bugs in any
way - the Fixes are neatly randomly distributed across the 
effective fixes in stable - it may be a bit biased because some
maintainer does not like Fixes: tags and her subsystem is 
significantly more complex/more buggy/better tested/etc. than
the average bussystem - so we would get a bit of a bias into it
all - but that does not invalidate the results. 
You can ask the voters in 3 states who they will elect president
and this will give you a less accurate result than if you ask in
all 51 states but if you factor in that uncertainty into the
result its perfectly valid and stays comparable to other results 

Im not saying that you simply can compare numeric values for
2016 with those from 2017 but you can compare the trends and
the expectations if you model uncertainties. 

Note that we have a huge advantage here - we can make predictions
from models - say predict 4.4.16 and then actually check our models

Now if there are really significant changes like the task struct
bein redone then that may have a large impact and the assumption
that the convoluted parameter "sublevel" is describing a more or
less stable development might be less correct - it will not be
completely wrong - and consequently the prediction quality will
suffer - but does that invalidate the approach ?


> 
> There's also the "bias" of the long-term and stable maintainer to skew
> the patches they review and work to get applied based on _why_ they are
> maintaining a specific tree.  I know I do that for the trees I maintain,
> and know the other stable developers do the same.  But those reasons are
> different, so you can't compare what is done to one tree vs. another one
> very well at all because of that bias.

If the procedures applied do not "jump" but evolve then bias is
not an issue - you can find many factors that will increas the
uncertainty of any such prediction - but if the parameters, which
all are convoluted - be it by presonal preferences of maintainers
selection of a specific FS in mainline distributions, etc - stil
represent the overall development and as long as your bias as you
called it does not flip-flop from 4.4.6 to 4.4.7 we do not care
to much.

> 
> So don't compare 3.10 to 3.4 or 3.2 and expect even the motivation to be
> identical to what is going on for that tree.
> 

no expectation of anything being constant - we simply say that the number of fixes was going up with sublevels in 3.2 now down !
and has since then shown improved trends with 4.4 showing a 
robust negative coupling (declining bug-fixes). This is valid
because it is generally *not* the maintainers that discover the
bugs - its the users/testers/reviewers. I doubt that maintainers
would reject a critical bug-fix provided to them due to personal
bias.

> > > > Some early results where presented at ALS in Japan on July 14th
> > > > but this still needs quite a bit of work.
> > > 
> > > Have a pointer to that presentation?
> > >
> > They probably are somewher on the ALS site - but I just dropped
> > them to our web-server at
> >   http://www.opentech.at/Statistics.pdf and
> >   http://www.opentech.at/TechSummary.pdf
> > 
> > This is quite a rough summary - so if anyone wants the actual data
> > or R commands used - let me know - no issue with sharing this and having
> > people tell me that Im totally wrong :)
> 
> Interesting, I'll go read them when I get the chance.
> 
> But I will make a meta-observation, it's "interesting" that people go
> and do analysis of development processes like this, yet never actually
> talk to the people doing the work about how they do it, nor how they
> could possible improve it based on their analysis.

I do talk to the people - Ive been doing this quit a bit - one of
the reasons for hoping over to ALS was precisely that. We ahve been
publishing our stuff all along including any findings, patches
etc. 

BUT: Im not going to go to LinuxCon and claim that I know how
     to do better - not based on the preliminary data we have now
 
Once we think we have something solid - I´ll be most happy to sit 
down and listen.

> 
> We aren't just people to just be researched, we can change if asked.
> And remember, I _always_ ask for help with the stable development
> process, I have huge areas that I know need work to improve, just no one
> ever provides that help...

And we are doing our best to support that - be it by documentation
fixes, compliance analysis, type safety analysis and appropriate
patches Ive been pestering maintainers with.

But you do have to give us the time to have SOLID data first
and NOT rush conclusions - as you pointed out here your self
some of the assumptions we are making might well be wrong so 
what kind of suggestions do you expect here ? 
 First get the data
  -> make a model
   -> deduce your analysis/sample/experiements
    -> write it all up and present it to the community 
     -> get the feedback and fix the model
and if after tha some significant findings are left - THEN
we will show up at LinuxCon and try to find someone to listen
to what we think we have to say...

> 
> And how is this at all a kernelnewbies question/topic?  That's even
> odder to me...

Well that is not - but the first was - I simply could not come up
with some resonable way to figure out the imapct of a patch on a
given config - that did sound to me like it would be a kernelnewbie
question... 

> 
> sorry for the rant,
>

Rants at that level are most welcome - I´ll put some of the
concerns raised on my TODO list for our next round of data analysis.

thx!
hofrat 



More information about the Kernelnewbies mailing list