Direct IO and Page cache

Chinmay V S cvs268 at gmail.com
Fri Jul 26 06:31:34 EDT 2013


On Fri, Jul 26, 2013 at 6:21 PM, Chinmay V S <cvs268 at gmail.com> wrote:
> On Fri, Jul 26, 2013 at 12:02 PM, Kumar amit mehta <gmate.amit at gmail.com> wrote:
>> On Fri, Jul 26, 2013 at 05:14:21PM +0800, Chinmay V S wrote:
>>> > We have direct I/O(O_DIRECT), for example raw devices(/dev/rawctl) that
>>> > map to the block devices and we also have page cache. Now If I've
>>> > understood this correctly, direct I/O will bypass this page cache, which
>>> > is fine, I'll not get into the performance debate, but what about data
>>> > consistency. Kernel cannot and __should'nt__ try to control how the
>>> > applications are being written. So one bad day somebody comes up with
>>> > an application which does both these two types of IO(one that goes
>>> > through page cache and the other that doesn't) and in that application,
>>> > one instance is writing directly to the backend device and the other
>>> > instance, who is not aware of this write, goes ahead and writes to the
>>> > page cache, and that write would be written later to the backend device.
>>> > So wouldn't we end up corrupting the on disk data.
>>>
>>> Yes. And that is the responsibility of the application. While the
>>> existence of O_DIRECT may not be common sense, anyone who knows about
>>> it *must* know that it bypasses the kernel page-cache and hence *must*
>>> know the consequences of doing cached and direct I/O on the same file
>>> simultaneously.
>>>
>>> > I can think of multiple other scenarios which could corrupt the on-disk
>>> > data, if there isn't any safeguarding policies employed by the kernel.
>>> > But I'm very much sure that kernel is aware of such nasty attempts, and
>>> > I'd like to know how does kernel takes care of this.
>>>
>>> O_DIRECT is an explicit flag not enabled by default.
>>>
>>> It is the app's responsibility to ensure that it does NOT misuse the
>>> feature. Essentially specifying the O_DIRECT flag is the app's way of
>>> saying - "Hey kernel, i know what i am doing. Please step aside and
>>> let me talk to the hardware directly. Please do NOT interfere."
>>>
>>> The kernel happily obliges.
>>>
>>> Later, the app should NOT go crying back to kernel (and blaming it),
>>> if the app manages to screw-up the direct "relationship" with the
>>> hardware.
>>
>> So leaving the hardware at the mercy of the application doesn't sound
>> like a good practice. This __may__ compromise kernel stability too. Also
>> think of this:
>>
>> In app1:
>> fdx = open("blah" , O_RW|O_DIRECT);
>> write(fdx,buf,sizeof(buf));
>>
>> In app2(unaware of app1):
>> fdy = open("blah", O_RW);
>> write(fdy,buf, sizeof(buf));
>>
>> I think this isn't highly unlikely to do, and if you agree with me then
>> we may end up with same could-be/would-be data-corruption. Now who should
>> be blamed here, app1, app2 or the kernel? Or it will be handled
>> differently here?
>
> As long as both app1 and app2 are managing separate files (even on the
> same underlying storage media), the situation looks good.
>
> From an app developer's perspective :
> In case both the apps do I/O on the same file then it implies
> knowledge of the other app. (Otherwise how would the second app know
> that the file exists at such and such location?) And hence the second
> app really ought to think about what it is going to do.
>
> case1: app1 uses regular I/O;
> ==> app2 should NOT use direct I/O.
>
> case2: app1 uses direct I/O;
> ==> app2 should NOT use regular I/O.
>
> From a kernel developer's perspective :
> The kernel driver guarantees coherency between then page-cache and
> data transferred using O_DIRECT. Refer to the page-15 of this deck[1]
> that talks about the design of O_DIRECT.
>
> In either case the bigger problem lies in the fact that both the apps
> need to work out a mutex mechanism to prevent the handful of
> readers-writers problems[2] when both try to read/write from the same
> file simultaneously.
>
> So it is more important(in fact, downright necessary) to ensure mutual
> exclusion between the 2 apps during I/O. Otherwise one of them will
> end-up overwriting the changes made by the other, unless both the apps
> are doing ONLY read()s.
>
> [1] http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html
> [2] http://en.wikipedia.org/wiki/Readers-writers_problem
>
>
> regards
> ChinmayVS

TL;DR

1. Do not worry about coherency between the page-cache and the data
transferred using O_DIRECT. The kernel will invalidate the cache after
an O_DIRECT write and flush the cache before an O_DIRECT read.

2. Use mutexes or semaphores(or any of the numerous options [1]) to
prevent the usual synchronisation problems during IPC using a shared
file.

[1] http://beej.us/guide/bgipc/output/html/singlepage/bgipc.html

regards
ChinmayVS



More information about the Kernelnewbies mailing list