maybe dumb question about RCU

Fri Apr 10 12:47:21 EDT 2015

> -----Original Message-----
> From: Andev [mailto:debiandev at gmail.com]
> Sent: Thursday, April 09, 2015 7:22 PM
> To: Jeff Haran
> Cc: kernelnewbies
> Subject: Re: maybe dumb question about RCU
> 
> On Thu, Apr 9, 2015 at 7:35 PM, Jeff Haran <Jeff.Haran at citrix.com> wrote:
> > Hi all,
> >
> > I've written some sample code to test out the question I raised earlier and
> it seems that if you call rcu_dereference() on a given RCU protected pointer
> more than once within a read critical section delimited by rcu_read_lock()
> and rcu_read_unlock(), you can get different values. Maybe this is well
> known to experienced kernel developers, but it was unclear to me from the
> documentation I'd read so I figured I'd share what I found with my fellow
> newbies.
> 
> Not exactly, but it is good that you wrote the test! Let us see what is
> happening...
> 
> >
> > The code for the sample kernel module is copied at the end if you'd like to
> try it yourself. Just insmod the resulting rcutest.ko. When I do so, I get this on
> my console:
> >
> > [root at s01b01 ~]# [  496.970678] rcutest initialized [  499.968538]
> > rcu_reader_thread a = 0xffffffffa00ab474 b = 0xffffffffa00ab475
> > off_count = 1 count = 3 [  503.975844] rcu_reader_thread a =
> > 0xffffffffa00ab475 b = 0xffffffffa00ab474 off_count = 2 count = 7 [
> > 507.983410] rcu_reader_thread a = 0xffffffffa00ab474 b =
> > 0xffffffffa00ab475 off_count = 3 count = 11 [  511.990554]
> > rcu_reader_thread a = 0xffffffffa00ab475 b = 0xffffffffa00ab474
> > off_count = 4 count = 15 [  515.997486] rcu_reader_thread a =
> > 0xffffffffa00ab474 b = 0xffffffffa00ab475 off_count = 5 count = 19 [
> > 520.005163] rcu_reader_thread a = 0xffffffffa00ab475 b =
> > 0xffffffffa00ab474 off_count = 6 count = 23 [  524.012020]
> > rcu_reader_thread a = 0xffffffffa00ab474 b = 0xffffffffa00ab475
> > off_count = 7 count = 27 [  528.019520] rcu_reader_thread a =
> > 0xffffffffa00ab475 b = 0xffffffffa00ab474 off_count = 8 count = 31 [
> 532.026716] rcu_reader_thread a = 0xffffffffa00ab474 b = 0xffffffffa00ab475
> off_count = 9 count = 35 [  536.034235] rcu_reader_thread a =
> 0xffffffffa00ab475 b = 0xffffffffa00ab474 off_count = 10 count = 39 [
> 540.041010] rcu_reader_thread a = 0xffffffffa00ab474 b = 0xffffffffa00ab475
> off_count = 11 count = 43 [  544.048265] rcu_reader_thread a =
> 0xffffffffa00ab475 b = 0xffffffffa00ab474 off_count = 12 count = 47 ...
> >
> > Indicating that the values obtained within the critical section can be
> different. So it would seem that the best practice when reading these RCU
> protected pointers is to get them once and only once within the critical
> section via a single call to rcu_dereference_pointer(), store that pointer
> someplace local and then operate on the copy rather than make multiple
> calls to rcu_dereference_pointer().
> >
> > The files copied below are the source rcutest.c, a Makefile and a Kbuild.
> You might have to fiddle with the Makefile to get it to work on your system.
> We build against many kernel versions here so the Makefile is setup to point
> to different kernel versions.
> >
> > Jeff Haran
> >
> > $ cat rcutest.c
> > #include <linux/module.h>
> > #include <linux/moduleparam.h>
> > #include <linux/version.h>
> > #include <linux/kernel.h>
> > #include <linux/types.h>
> > #include <linux/kdev_t.h>
> > #include <linux/fs.h>
> > #include <linux/device.h>
> > #include <linux/cdev.h>
> > #include <linux/uaccess.h>
> > #include <linux/sched.h>
> > #include <linux/wait.h>
> > #include <linux/list.h>
> > #include <linux/kthread.h>
> >
> > char rcu_test_1[1];
> > char rcu_test_2[1];
> > char *rcu_pointer = rcu_test_1;
> > struct task_struct *updater_task;
> > struct task_struct *reader_task;
> > int off_count = 0;
> >
> > static int rcu_updater_thread(void *arg) {
> >         int count = 0;
> >
> >         while (!kthread_should_stop()) {
> >                 if (count & 1) {
> >                         rcu_assign_pointer(rcu_pointer, rcu_test_2);
> >                 } else {
> >                         rcu_assign_pointer(rcu_pointer, rcu_test_1);
> >                 }
> >
> >                 synchronize_rcu();
> 
> This is the key to why you are seeing different values.
> synchronize_rcu() returns only after a grace period which is the duration for
> which rcu pointers are held without being modified. If you have read my
> previous example global_v1 will be valid only within the grace period when
> the read critical section started.
> 
> The crucial thing to note here is that grace periods cannot last forever. Grace
> period is extended if a read critical section is still active. But the period will
> not be extended forever. A grace period will be forced to end after giving it a
> sufficient time, hence the requirement to keep your read side critical
> sections short so that a grace period can end and rcu can free/complete all
> the enqueued callbacks.
> 
> 
> >                 count++;
> >                 msleep(1500);
> 
> This is a huge time. Many grace periods will be elapsed in this time

The msleep() happens after the rcu_assign_pointer()/synchronize_rcu() sequence, not in the middle of it. All that msleep() is for is to make sure the updater isn't spinning that core and I chose the value so that the updater which thus runs every 1.5 seconds doesn't beat with the reader that is holding the rcu read lock for 1 second. Now one can argue that this in the reader is an atypically long time to hold a RCU read lock:

                rcu_read_lock();
                a = rcu_dereference(rcu_pointer);
                start = get_jiffies_64();
                while (get_jiffies_64() < (start + 1000));
                b = rcu_dereference(rcu_pointer);
                rcu_read_unlock();

That causes the RCU read "lock" to be held for 1 second (at least on systems where a jiffie is a millisecond). But if that's "too long" then how long exactly is "too long"? If 1 second is too long, is 1 millisecond too long? How is the developer using this interface to know either how long is too long or how long the code they've written will take to execute given the plethora of different systems it might run on?

The documentation I've seen says that the only restriction on RCU readers is that they don't block and the above doesn't so it satisfies the requirements of the interface near as I can tell.

The conclusion I've come to is that the read critical section is *NOT* there to make sure that the value returned by rcu_dereference() of a given pointer doesn't change within the RCU read critical section (the code delimited by the rcu_read_lock() and rcu_read_unlock()).

Rather the purpose of rcu_read_lock()/rcu_read_unlock() is there to keep synchronize_rcu() at bay so that the updater can safely free any memory block that may have been overwritten in the most recent call to rcu_assign_pointer(). The example I provided where rcu_pointer just oscillates between two different addresses of static buffers was done that way to make it simple. More typically, the pointers passed as the second parameter to rcu_assign_pointer() point to allocated memory. When updaters do that, they need to free those pointers eventually but they can't do that until they know that all readers aren't accessing them anymore. Normally, any freeing would happen after synchronize_rcu() returned and the read critical section is there to tell synchronize_rcu() when its safe to return and let the updater free the old memory block.

At least that's how I read it now.

Jeff Haran