Split RAID: Proposal for archival RAID using incremental batch checksum

Anshuman Aggarwal anshuman.aggarwal at gmail.com
Mon Nov 24 01:48:48 EST 2014


Sandeep,
 This isn't exactly RAID4 (only thing in common is a single parity
disk but the data is not striped at all). I did bring it up on the
linux-raid mailing list and have had a short conversation with Neil.
He wasn't too excited about device mapper but didn't indicate why or
why not.

I would like to have this as a layer for each block device on top of
the original block devices (intercepting write requests to the block
devices and updating the parity disk). Is device mapper the write
interface? What are the others? Also if I don't store the metadata on
the block device itself (to allow the block device to be unaware of
the RAID4 on top...how would the kernel be informed of which devices
together form the Split RAID.

Appreciate the help.

Thanks,
Anshuman

On 24 November 2014 at 11:06, SandeepKsinha <sandeepksinha at gmail.com> wrote:
>
>
> On Sat, Nov 22, 2014 at 8:24 PM, Greg Freemyer <greg.freemyer at gmail.com>
> wrote:
>>
>>
>>
>> On November 22, 2014 9:43:23 AM EST, Anshuman Aggarwal
>> <anshuman.aggarwal at gmail.com> wrote:
>> >On 22 November 2014 at 19:33, Greg Freemyer <greg.freemyer at gmail.com>
>> >wrote:
>> >> On Sat, Nov 22, 2014 at 8:22 AM, Anshuman Aggarwal
>> >> <anshuman.aggarwal at gmail.com> wrote:
>> >>> By not using stripes, we restrict writes to happen to just 1 drive
>> >and
>> >>> the XOR output to the parity drive which then explains the delayed
>> >and
>> >>> batched checksum (resulting in fewer writes to the parity drive).
>> >The
>> >>> intention is that if a drive fails then maybe we lose 1 or 2 movies
>> >>> but the rest is restorable from parity.
>> >>>
>> >>> Also another advantage over RAID5 or RAID6 is that in the event of
>> >>> multiple drive failure we only lose the content on the failed drive
>> >>> not the whole cluster/RAID.
>> >>>
>> >>> Did I clarify better this time around?
>> >>
>> >> I still don't understand the delayed checksum/parity.
>> >>
>> >> With classic raid 4, writing 1 GB of data to just D1 would require 1
>> >> GB of data first be read from D1 and 1 GB read from P then 1 GB
>> >> written to both D1 and P.  4 GB worth of I/O total.
>> >>
>> >> With your proposal, if you stream 1 GB of data to a file on D1:
>> >>
>> >> - Does the old/previous data on D1 have to be read?
>> >>
>> >> -  How much data goes to the parity drive?
>> >>
>> >> - Does the old data on the parity drive have to be read?
>> >>
>> >> -  Why does delaying it reduce that volume compared to Raid 4?
>> >>
>> >> -  In the event drive 1 fails, can its content be re-created from the
>> >> other drives?
>> >>
>> >> Greg
>> >> --
>> >> Greg Freemyer
>> >
>> >Two things:
>> >Delayed writes basically to allow the parity drive to spin down if the
>> >parity writing is only 1 block instead of spinning up the drive for
>> >every write (obviously the data drive has to be spun up). Delays will
>> >be both time and size constrained.
>> >For a large write such as a 1 GB of data to file it would trigger a
>> >configurable maximum delaying limit which would then dump to parity
>> >drive immediately preventing memory overuse.
>> >
>> >This again ties in to the fact that the content is not 'critical' so
>> >if parity was not dumped when a drive fails, worst case you only lose
>> >the latest file.
>> >
>> >Delayed writes may be done via bcache or a similar implementation
>> >which caches the writes in memory and need not be part of the split
>> >raid driver at all.
>>
>> That provided little clarity.
>>
>> File systems like xfs queue (delay) significant amounts of actual data
>> before writing it to disk.  The same is true of journal data.  If all you
>> are doing is caching the parity up until their is enough to bother with,
>> then a filesystem designed for streamed data already does the for the data
>> drive, thus you don't need to do anything new for the parity drive, just run
>> it in sync with the data drive.
>>
>> At this point I interpret your proposal to be:
>>
>> Implement a Raid 4 like setup, but instead if stripping the date data
>> drives, concatenate them.
>>
>> That is something I haven't seen done, but I can see why you would want
>> it.  Implementing via unionfs I don't understand, but as a new device mapper
>> mechanism it seems very logical.
>>
>> Obviously, I'm not a device mapper maintainer, so I'm not saying it would
>> be accepted, but if I'm right you can now have a discussion of just a few
>> sentences which explain your goal.
>>
>
> RAID4 support does not exist in the mainline. Anshuman, you might want to
> reach out to Neil Brown who is the maintainer for dmraid.
> IIUC, your requirement can be well implemented by writing a new device
> mapper target. That will make it modular and will help you make improvements
> to it easily.
>
>
>
>
>>
>> Greg
>> --
>> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies at kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
>
>
> --
> Regards,
> Sandeep.
>
>
>
>
>
>
> “To learn is to change. Education is a process that changes the learner.”



More information about the Kernelnewbies mailing list