Determining patch impact on a specific config

Wed Aug 17 10:49:22 EDT 2016

On Wed, Aug 17, 2016 at 04:17:19PM +0200, Greg KH wrote:
> On Wed, Aug 17, 2016 at 02:01:28PM +0000, Nicholas Mc Guire wrote:
> > On Wed, Aug 17, 2016 at 03:52:16PM +0200, Greg KH wrote:
> > > On Wed, Aug 17, 2016 at 03:25:44PM +0200, Greg KH wrote:
> > > > On Wed, Aug 17, 2016 at 12:39:39PM +0000, Nicholas Mc Guire wrote:
> > > > > 
> > > > > Hi !
> > > > > 
> > > > >  For a given patch I would like to find out if it impacts a
> > > > >  given configuration or not. Now of course one could compile the
> > > > >  kernel for the configuration prior to the patch, then apply the
> > > > >  patch and recompile to find out if there is an impact but I would
> > > > >  be looking for some smarter solution. Checking files only 
> > > > >  unfortunately will not do it, due to ifdefs and friends so make
> > > > >  would detect a change and recompile even if the affeted code 
> > > > >  area is actualy dropped by the preprocessor.
> > > > > 
> > > > >  What Im trying to do is find out is, how many of the e.g. stable
> > > > >  fixes of 4.4-4.4.14 would have impacted a given configuration - the
> > > > >  whole exercise is intended for some statistical analysis of bugs
> > > > >  in linux-stable.
> > > 
> > > Also, are you going to be analyizing the bugs in the stable trees, or
> > > the ones we just happen to fix?
> > > 
> > > Note, that's not always the same thing :)
> > >
> > what we have been looking at first is the stable fixes
> > for which the bug-commit is known via Fixes: patch. That only
> > a first approximation but correlates very good with the
> > overall stable fix rates. And from the regression analysis
> > of the stable fix rates over versions one then can exstimate the
> > residual bugs if one knows the distribution of the bug 
> > survival times - which one again can estimate based on the
> > bug-fixes that have Fixes: tags. 
> 
> That is all relying on the Fixes: tags, which are not used evenly across
> the kernel at all.  Heck, there are still major subsystems that NEVER
> mark a single patch for the stable trees, let alone adding Fixes: tags.
> Same thing goes for most cpu architectures.

Well for the config we studied it was not that bad

4.4 - 4.4.13 stable bug-fix commits 
         total   with    % with
         fix     Fixes:  Fixes
         commits tag     tag in
         1643    589     subsys
kernel   3.89%   4.75%   43.7%
mm       1.82%   2.17%   53.3%
block    0.36%   0.84%   83.3%!
fs       8.76%   4.92%*  20.1%*
net      9.31%   12.56%  48.3%
drivers  47.96%  49.23%  36.8%
include  6.87%   19.18%  28.3%*
arch/x86 4.50%   12.56%  33.7%
 (Note that the precentages here do not add up
  to 100% because we just picked out x86 and did not 
  include all subsystems e.g. lib is missing).

 So fs is significantly below and include a bit - block is 
 hard to say simply because it was only 6 stable fixes of 
 which 5 had Fixes: tags so that sample is too small.
 Correlating overall stable-fixes distribution over sublevels
 with stabel-fixes with Fixes: tag gives me an R^2 of 0.76
 so that does show that for any trending using Fixes: tags
 is resonable. As noted we are looking at statistic properties
 to come up with expected values nothing more.

> 
> So be careful about what you are trying to measure, it might just be not
> what you are assuming it is...

A R^2 of 0.76 does indicate that the commits with Fixes: tags in 4.4 series
is quite well representing the overall stable fixes. 

> 
> Also note that LWN.net already published an article based on the fixes:
> tags and tracking that in stable releases.

ok will go dig for that - I did not stumble across that yet - actually
did check lwn.net for Fixes tag related infos and found some patches
noted - specifically Doc patches.

> 
> > I dont know yet how robust these models will be at the end
> > but from what we have until now I do think we can come up
> > with quite sound predictions for the residual faults in the
> > kernel.
> 
> Based on what I know about how stable patches are picked and applied, I
> think you will find it is totally incorrect.  But hey, what do I know?
> :)

Well if I look at the overall stable fixes developlment - not just those
with Fixes: tags I get very clear trends if we look at at stable fixes
over sublevels (linear model using gamma-distribution)

ver  intercept slope      p-value DoF AIC
3.2  4.2233783 0.0059133  < 2-16  79  2714.8
3.4  3.9778258 -0.0005657 0.164 * 110 4488
3.10 4.3841885 -0.0085419 < 2-16  98  2147.1
3.12 4.7146752 -0.0014718 0.0413  58  1696.9
3.14 4.6159638 -0.0131122 < 2-16  70  2124.8
3.18 4.671178  -0.006517  7.34-5  34  1881.2
4.1  4.649701  -0.004211  0.09    25  1231.8
4.4  5.049331  -0.039307  7.69-11 12  571.48

So while the confidence levels of some (notable 3.4) is not
that exciting the overall trend does look resonably establshied
that the slop is turning negative - indicating that the
number of stable-fixes of sublevels systematically decreases
with sub-lvels, which does indicate a stable development process.

> 
> > Some early results where presented at ALS in Japan on July 14th
> > but this still needs quite a bit of work.
> 
> Have a pointer to that presentation?
>
They probably are somewher on the ALS site - but I just dropped
them to our web-server at
  http://www.opentech.at/Statistics.pdf and
  http://www.opentech.at/TechSummary.pdf

This is quite a rough summary - so if anyone wants the actual data
or R commands used - let me know - no issue with sharing this and having
people tell me that Im totally wrong :)

thx!
hofrat