question about kref API

Wed Jul 23 13:33:19 EDT 2014

> -----Original Message-----
> From: Greg KH [mailto:greg at kroah.com]
> Sent: Tuesday, July 22, 2014 4:47 PM
> To: Jeff Haran
> Cc: kernelnewbies at kernelnewbies.org
> Subject: Re: question about kref API
> 
> On Tue, Jul 22, 2014 at 05:25:03PM +0000, Jeff Haran wrote:
> > > -----Original Message-----
> > > From: Greg KH [mailto:greg at kroah.com]
> > > Sent: Monday, July 21, 2014 7:18 PM
> > > To: Jeff Haran
> > > Cc: kernelnewbies at kernelnewbies.org
> > > Subject: Re: question about kref API
> > >
> > > On Tue, Jul 22, 2014 at 12:27:20AM +0000, Jeff Haran wrote:
> > > > Hi,
> > > >
> > > > I've been reading Documentation/kref.txt in order to understand the
> > > > usage of this API. There is part of this documentation that I am
> > > > having difficulty understanding and was hoping somebody on this list
> > > > could clarify this. One of my assumptions in reading this is that the
> > > > expected usage of this API is that for a given object embedding a
> > > > kref, once the object has been initialized the number of calls to
> > > > "put" a given instance of an object should never exceed the number of
> > > > calls to "get" that same instance.
> > >
> > > If it does, the object will be cleaned up and deleted from the system,
> > > so you no longer have a valid pointer.
> > >
> > > > Maybe that's the root of my misunderstanding, but my assumption is
> > > > that calls to kref_get() and kref_put() are expected to be made in
> > > > pairs for a given instance of struct kref. If this is wrong, please
> > > > let me know.
> > >
> > > "pairs" in that the same number must be called in order for things to
> > > work properly.
> > >
> > > > Kref.txt includes some sample code that discusses using a mutex to
> > > > serialize the execution of its get_entry() and put_entry() functions:
> > > >
> > > > 146 static DEFINE_MUTEX(mutex);
> > > > 147 static LIST_HEAD(q);
> > > > 148 struct my_data
> > > > 149 {
> > > > 150         struct kref      refcount;
> > > > 151         struct list_head link;
> > > > 152 };
> > > > 153
> > > > 154 static struct my_data *get_entry()
> > > > 155 {
> > > > 156         struct my_data *entry = NULL;
> > > > 157         mutex_lock(&mutex);
> > > > 158         if (!list_empty(&q)) {
> > > > 159                 entry = container_of(q.next, struct my_data, link);
> > > > 160                 kref_get(&entry->refcount);
> > > > 161         }
> > > > 162         mutex_unlock(&mutex);
> > > > 163         return entry;
> > > > 164 }
> > > > 165
> > > > 166 static void release_entry(struct kref *ref)
> > > > 167 {
> > > > 168         struct my_data *entry = container_of(ref, struct my_data, refcount);
> > > > 169
> > > > 170         list_del(&entry->link);
> > > > 171         kfree(entry);
> > > > 172 }
> > > > 173
> > > > 174 static void put_entry(struct my_data *entry)
> > > > 175 {
> > > > 176         mutex_lock(&mutex);
> > > > 177         kref_put(&entry->refcount, release_entry);
> > > > 178         mutex_unlock(&mutex);
> > > > 179 }
> > > >
> > > > The sample code does not show the creation of the link list headed by
> > > > q,
> > >
> > > That is done there in the static initializer.
> 
> [meta comment, please properly wrap your lines...]
> 
> > You are referring to line 147 here, right? That creates an empty list
> > if I am following the code correctly.
> 
> Yes.
> 
> > What I meant was that the sample code doesn't show how instances of
> > struct my_data get initialized and inserted into the list at q. Makes
> > sense to leave that out for brevity I suppose and you've made it clear
> > below that for this to work right a call to kref_init() must have been
> > made on the refcount fields of any struct my_data instances that got
> > put into the list headed by q.  Thanks.
> 
> Yes, that code is left as an exercise for the reader.  Or you can just
> look at the kernel where it is used in many places directly :)
> 
> > At this point it's rule (3) that I am still struggling with a bit:
> >
> >  50 3) If the code attempts to gain a reference to a kref-ed structure
> >  51    without already holding a valid pointer, it must serialize access
> >  52    where a kref_put() cannot occur during the kref_get(), and the
> >  53    structure must remain valid during the kref_get().
> >
> > In this example, every call to put_entry() results in locking and
> > unlocking the mutex. But if I am following this right, that is only
> > because the entry at the head of the list is removed from the list
> > when and only when the last reference to it is released.
> 
> It has nothing to do with a list, don't get hung up on a list with a
> kref, it's not needed, just used as an example here.
> 
> > If the list_del() happened for some other cause (say a timer expired
> > or user space sent a message to delete the entry), then the taking of
> > the mutex in put_entry() wouldn't be necessary, right?
> 
> No, it still would be.
> 
> > For instance, this would be valid, wouldn't it?
> >
> > static void release_entry(struct kref *ref)
> > {
> > 	struct my_data *entry = container_of(ref, struct my_data, refcount);
> >
> > 	kfree(entry);
> > }
> >
> > static void put_entry(struct my_data *entry)
> > {
> > 	kref_put(&entry->refcount, release_entry);
> > }
> >
> > static void del_entry(struct my_data *entry)
> > {
> > 	mutex_lock(&mutex);
> > 	list_del(&entry->link);
> > 	mutex_unlock(&mutex);
> > 	put_entry(entry);
> > }
> 
> Nope.  Don't get list entries confused with an object that a kref
> happens to maintain.  You still need to lock the kref itself, two
> different people could be calling put_entry() at the same time, right?

Ok, so that's an issue I don't understand. Looking at what kref_put() does,
I don't see why any external locking would be required to serialize
concurrent callers to put_entry(). As I am sure you are aware,
kref_put() ends up being a call to this, where the count passed in is 1:

68 static inline int kref_sub(struct kref *kref, unsigned int count,
 69              void (*release)(struct kref *kref))
 70 {
 71         WARN_ON(release == NULL);
 72 
 73         if (atomic_sub_and_test((int) count, &kref->refcount)) {
 74                 release(kref);
 75                 return 1;
 76         }
 77         return 0;
 78 }

atomic_sub_and_test() is in of itself thread safe, at least that's the way
I read its header comment block. In this case the refcount field is atomically
decremented by 1 and if the result of that decrementation is 0, it returns TRUE
otherwise FALSE. That means that if there are N concurrent callers to
kref_put() when refcount initially contains N, atomic_sub_and_test() will
return TRUE for 1 and only 1 of those callers. All of the other callers will
see atomic_sub_and_test() return FALSE and simply return 0. That means
the release function can get called only once. And since that will only
happen when refcount has gotten down to 0, there will be no other
existing references to the kref as all the other callers must have already
proceeded far enough so that their calls to atomic_sub_and_test() have
completed. They have nothing left to do but return 0 to their callers and if
those callers are following the rules they will make no further references
to the entry. So I don't see how any sort of race condition is possible among
concurrent callers to put_entry() here.

There also doesn't seem to be any sort of race possible between concurrent
callers to put_entry() and get_entry(). If put_entry() finds that the reference
count has gotten down to zero as part of its call to atomic_sub_and_test(),
then the entry can't be in the list so a concurrent caller to get_entry() will either
see that the list empty and never take a new reference or it will take a
reference to the next entry in the list.

My apologies for seeming to rat hole on this, but I must be missing
something fundamental here since I really don't see any race conditions
for your proposed put side locking to prevent. The usage of the atomic
refcount seem to be sufficient in this case. That's the beauty of the API, or at
least that's what I thought was the beauty of it. :)

Clearly there are potential performance benefits in not needing to take
locks or mutexes when they are not necessary. If you could elaborate on where
the race condition is here, I think you'd being doing both me and the
community a great service.

Thanks again,

Jeff Haran

> > In this example, threads that want access to an entry do need to take
> > the mutex in order to get it from the list in get_entry(), but when
> > they are done with it but don't want it deleted from the list they can
> > call put_entry() and no mutex need be taken. The only time the mutex
> > need be taken is when the caller wants to also delete the entry from
> > the list, which is what del_entry() is for.
> >
> > Put another way, the mutex is really there to serialize access to the
> > list, right?
> 
> Again, don't focus on the list, and you should be fine.
> 
> Do you have a real-world use of kref that you are curious about?  That
> might help out more here.
> 
> thanks,
> 
> greg k-h