Split RAID: Proposal for archival RAID using incremental batch checksum

Sat Nov 22 09:03:42 EST 2014

On Sat, Nov 22, 2014 at 8:22 AM, Anshuman Aggarwal
<anshuman.aggarwal at gmail.com> wrote:
> By not using stripes, we restrict writes to happen to just 1 drive and
> the XOR output to the parity drive which then explains the delayed and
> batched checksum (resulting in fewer writes to the parity drive). The
> intention is that if a drive fails then maybe we lose 1 or 2 movies
> but the rest is restorable from parity.
>
> Also another advantage over RAID5 or RAID6 is that in the event of
> multiple drive failure we only lose the content on the failed drive
> not the whole cluster/RAID.
>
> Did I clarify better this time around?

I still don't understand the delayed checksum/parity.

With classic raid 4, writing 1 GB of data to just D1 would require 1
GB of data first be read from D1 and 1 GB read from P then 1 GB
written to both D1 and P.  4 GB worth of I/O total.

With your proposal, if you stream 1 GB of data to a file on D1:

- Does the old/previous data on D1 have to be read?

-  How much data goes to the parity drive?

- Does the old data on the parity drive have to be read?

-  Why does delaying it reduce that volume compared to Raid 4?

-  In the event drive 1 fails, can its content be re-created from the
other drives?

Greg
--
Greg Freemyer