Direct IO and Page cache

Fri Jul 26 00:02:31 EDT 2013

On Fri, Jul 26, 2013 at 05:14:21PM +0800, Chinmay V S wrote:
> > We have direct I/O(O_DIRECT), for example raw devices(/dev/rawctl) that
> > map to the block devices and we also have page cache. Now If I've
> > understood this correctly, direct I/O will bypass this page cache, which
> > is fine, I'll not get into the performance debate, but what about data
> > consistency. Kernel cannot and __should'nt__ try to control how the
> > applications are being written. So one bad day somebody comes up with
> > an application which does both these two types of IO(one that goes
> > through page cache and the other that doesn't) and in that application,
> > one instance is writing directly to the backend device and the other
> > instance, who is not aware of this write, goes ahead and writes to the
> > page cache, and that write would be written later to the backend device.
> > So wouldn't we end up corrupting the on disk data.
> 
> Yes. And that is the responsibility of the application. While the
> existence of O_DIRECT may not be common sense, anyone who knows about
> it *must* know that it bypasses the kernel page-cache and hence *must*
> know the consequences of doing cached and direct I/O on the same file
> simultaneously.
> 
> > I can think of multiple other scenarios which could corrupt the on-disk
> > data, if there isn't any safeguarding policies employed by the kernel.
> > But I'm very much sure that kernel is aware of such nasty attempts, and
> > I'd like to know how does kernel takes care of this.
> 
> O_DIRECT is an explicit flag not enabled by default.
> 
> It is the app's responsibility to ensure that it does NOT misuse the
> feature. Essentially specifying the O_DIRECT flag is the app's way of
> saying - "Hey kernel, i know what i am doing. Please step aside and
> let me talk to the hardware directly. Please do NOT interfere."
> 
> The kernel happily obliges.
> 
> Later, the app should NOT go crying back to kernel (and blaming it),
> if the app manages to screw-up the direct "relationship" with the
> hardware.

So leaving the hardware at the mercy of the application doesn't sound
like a good practice. This __may__ compromise kernel stability too. Also
think of this:

In app1:
fdx = open("blah" , O_RW|O_DIRECT);
write(fdx,buf,sizeof(buf));

In app2(unaware of app1):
fdy = open("blah", O_RW);
write(fdy,buf, sizeof(buf));

I think this isn't highly unlikely to do, and if you agree with me then
we may end up with same could-be/would-be data-corruption. Now who should
be blamed here, app1, app2 or the kernel? Or it will be handled
differently here?

!!amit