Split RAID: Proposal for archival RAID using incremental batch checksum

Thu Nov 27 13:31:24 EST 2014

On Thu, Nov 27, 2014 at 12:50 PM, Anshuman Aggarwal
<anshuman.aggarwal at gmail.com> wrote:
> On 25 November 2014 at 10:26, Greg Freemyer <greg.freemyer at gmail.com> wrote:
>>
>>
>> On November 24, 2014 12:28:08 PM EST, Anshuman Aggarwal <anshuman.aggarwal at gmail.com> wrote:
>>>On 24 November 2014 at 18:49, Greg Freemyer <greg.freemyer at gmail.com>
>>>wrote:
<snip>

>>>>> Also if I don't store the metadata on
>>>>>the block device itself (to allow the block device to be unaware of
>>>>>the RAID4 on top...how would the kernel be informed of which devices
>>>>>together form the Split RAID.
>>>>
>>>> I don't understand the question.
>>>
>>>mdadm typically has a metadata superblock stored on the block device
>>>which identifies the block device as part of the RAID and typically
>>>prevents it from directly recognized by file system code . I was
>>>wondering if Split RAID block devices can be made to be unaware to the
>>>RAID scheme on top and be fully mountable and usable without the raid
>>>drivers (of course invalidating the parity if any of them are written
>>>to). This allows a parity disk to be added to existing block devices
>>>without having to setup the superblock on the underlying devices.
>>>
>>>Hope that is clear now?
>>
>> Thank you, I knew about the superblock, but didn't realize that was what you were talking about.
>>
>> Does this address your desire?
>>
>> https://raid.wiki.kernel.org/index.php/RAID_superblock_formats#mdadm_v3.0_--_Adding_the_Concept_of_User-Space_Managed_External_Metadata_Formats
>>
>> Fyi: I'm ignorant of any real details and I have not used the above new feature, but it seems to be what you asking for.
>>
>
> It doesn't seem to because it appears that the unified container would
> still need to be the created before putting any data on the device.
> Ideally, the split raid can be added as an after thought by just
> adding a parity disk (block device) to an existing set of disks (block
> devices)

So what precisely does "creating a container" really do?

ie. have you run strace on "mdadm --create --verbose /dev/md/imsm
/dev/sd[b-g] --raid-devices 4 --metadata=imsm"?

I'm assuming for your use case /etc/ could hold a metadata file thast
defined a container and then a second metadata file that defined the
splitRAID setup.

>>>>
>>>> The filesystem has no knowledge there is a split raid below it.  It
>>>simply reads/writes to the overall, device mapper is layered below it
>>>and triggers the required i/o calls.
>>>>
>>>> Ie. For a read, it is a straight passthrough.  For a write, the old
>>>data and old parity have to be read in, modified, written out.  Device
>>>mapper does this now for raid 4/5/6, so most of the code is in place.
>>>
>>>Exactly. Reads are passthrough, writes lead to the parity write being
>>>triggered. Only remaining concern for me is that the md super block
>>>will require block device to be initialized using mdadm. That can be
>>>acceptable I suppose, but an ideal solution would be able to use
>>>existing block devices (which would be untouched)...put passthrough
>>>block device on top of them and manage the parity updation on the
>>>parity block device. The information about which block devices
>>>comprise the array can be stored in a config file etc and does not
>>>need a superblock as badly as a raid setup.
>>
>> Hopefully the new user space feature does just that.
>>
>> Greg
>
> Although the user space feature doesn't seem to, Neil has suggested a
> way to try out using RAID-4 in a manner so as to create a split raid
> like array. Will post on this mailing list if it succeeds.

I've used hardware raid setup with raid-1 that did what you want.  If
needed, you could pull out a drive and connected straight to another
computer and everything just worked (except mirroring).

Since you're working with Neil you have the expert on the case, but
don't forget most drives have unused space between sector 1 and the
start of the first partition.  ie. Traditionally sectors 1-62 were
unused/blank.  Newer systems start the first partition at sector 2048,
so sectors 1-2047 are blank.

I don't recall off-hand which sectors a GPT setup uses, but I assume
you can find an area that is rarely used.

Greg
>> --
>> Sent from my Android phone with K-9 Mail. Please excuse my brevity.