block size vs bvec length

Valdis Kl=?utf-8?Q?=c4=93?=tnieks valdis.kletnieks at vt.edu
Sun Apr 5 19:35:25 EDT 2020


On Sun, 05 Apr 2020 19:17:39 +0100, Michele Sorcinelli said:

> I used rq_for_each_segment() to print bvec.bv_len of the segments and it
> appears to be 4096.
>
> Why is it 4096 rather than 512?

What is the actual device backing this block device?

> Also writing a block of 4096 bytes with dd to /dev/myblock will result in a
> single write request, while writing a block of 512 bytes will result in a read
> request followed by a write request.
>
> Can someone explain this behavior?

That's called a read-modify-write (RMW) cycle, and is used when a write request
isn't exactly one physical block long, and it happens for file devices as well,
it's just hidden by the file system layer.

Say you have a device/file that has a 4096 physical block.  You want to write
256 bytes, starting at an offset of 512 bytes into the file. To avoid
destroying the *rest* of the 4096 byte block, what happens is:

You read the entire 4096 byte block into a buffer, which now has the entire old
contents of that block.  You then copy the 256 bytes into the appropriate
section of the buffer, so it now contains the old data except where the new
data has been copied.  You then write the entire updated 4096 byte buffer back
to the device.

This becomes a major headache for high-performance disk I/O.  When you're
trying to write data out at 5 gigabytes/second, the last thing you need is some
researcher using the wrong write buffer size and making every write to a RAID6
into a read-modify-write.

Actually, I take that back - using the wrong buffer size *and* a bollixed
offset so half the writes end up being *two* RMW cycles is the last thing you
need :)

And if the researcher manages to screw up the stripe size as well - that
usually results in 3 sysadmins with clue-by-4's visiting the researcher to
advise them on the error of their ways.. :)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20200405/89efcc4b/attachment.sig>


More information about the Kernelnewbies mailing list