Split RAID: Proposal for archival RAID using incremental batch checksum

Greg Freemyer greg.freemyer at gmail.com
Sat Nov 22 09:54:39 EST 2014



On November 22, 2014 9:43:23 AM EST, Anshuman Aggarwal <anshuman.aggarwal at gmail.com> wrote:
>On 22 November 2014 at 19:33, Greg Freemyer <greg.freemyer at gmail.com>
>wrote:
>> On Sat, Nov 22, 2014 at 8:22 AM, Anshuman Aggarwal
>> <anshuman.aggarwal at gmail.com> wrote:
>>> By not using stripes, we restrict writes to happen to just 1 drive
>and
>>> the XOR output to the parity drive which then explains the delayed
>and
>>> batched checksum (resulting in fewer writes to the parity drive).
>The
>>> intention is that if a drive fails then maybe we lose 1 or 2 movies
>>> but the rest is restorable from parity.
>>>
>>> Also another advantage over RAID5 or RAID6 is that in the event of
>>> multiple drive failure we only lose the content on the failed drive
>>> not the whole cluster/RAID.
>>>
>>> Did I clarify better this time around?
>>
>> I still don't understand the delayed checksum/parity.
>>
>> With classic raid 4, writing 1 GB of data to just D1 would require 1
>> GB of data first be read from D1 and 1 GB read from P then 1 GB
>> written to both D1 and P.  4 GB worth of I/O total.
>>
>> With your proposal, if you stream 1 GB of data to a file on D1:
>>
>> - Does the old/previous data on D1 have to be read?
>>
>> -  How much data goes to the parity drive?
>>
>> - Does the old data on the parity drive have to be read?
>>
>> -  Why does delaying it reduce that volume compared to Raid 4?
>>
>> -  In the event drive 1 fails, can its content be re-created from the
>> other drives?
>>
>> Greg
>> --
>> Greg Freemyer
>
>Two things:
>Delayed writes basically to allow the parity drive to spin down if the
>parity writing is only 1 block instead of spinning up the drive for
>every write (obviously the data drive has to be spun up). Delays will
>be both time and size constrained.
>For a large write such as a 1 GB of data to file it would trigger a
>configurable maximum delaying limit which would then dump to parity
>drive immediately preventing memory overuse.
>
>This again ties in to the fact that the content is not 'critical' so
>if parity was not dumped when a drive fails, worst case you only lose
>the latest file.
>
>Delayed writes may be done via bcache or a similar implementation
>which caches the writes in memory and need not be part of the split
>raid driver at all.

That provided little clarity.

File systems like xfs queue (delay) significant amounts of actual data before writing it to disk.  The same is true of journal data.  If all you are doing is caching the parity up until their is enough to bother with, then a filesystem designed for streamed data already does the for the data drive, thus you don't need to do anything new for the parity drive, just run it in sync with the data drive.

At this point I interpret your proposal to be:

Implement a Raid 4 like setup, but instead if stripping the date data drives, concatenate them.

That is something I haven't seen done, but I can see why you would want it.  Implementing via unionfs I don't understand, but as a new device mapper mechanism it seems very logical.

Obviously, I'm not a device mapper maintainer, so I'm not saying it would be accepted, but if I'm right you can now have a discussion of just a few sentences which explain your goal.

Greg
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.



More information about the Kernelnewbies mailing list