BFQ: simple elevator

Valdis.Kletnieks at vt.edu Valdis.Kletnieks at vt.edu
Thu Mar 21 05:13:12 EDT 2013


On Wed, 20 Mar 2013 16:37:41 -0700, Raymond Jennings said:

> Hmm...Maybe a hybrid approach that allows a finite number of reverse
> seeks, or as I suspect deadline does a finite delay before abandoning
> the close stuff to march to the boonies.

Maybe. Maybe not.  It's going to depend on the workload - look how many times
we've had to tweak something as obvious as cache writeback to get it to behave
for corner cases.  You'll think you got the algorithm right, and then the next
guy to test-drive it will do something only 5% different and ends up cratering
the disk. :)

Now of course, the flip side of "a disk's average seek time is between 5ms and
12ms depending how much you paid for it" is that there's no spinning disk on
the planet that can do much more than 200 seeks per second (oh, and before you
knee-jerk and say "SSD to the rescue", that's got its own issues). Right now,
you should be thinking "so *that* is why xfs and ext4 do extents - so we can
keep file I/O as sequential as possible with as few seeks as possible". Other
things you start doing if you want *real* throughput: you start looking at
striped and parallel filesystems, self-defragmenting filesystems,
multipath-capable disk controllers, and other stuff like that to spread the I/O
across lots of disks fronted by lots of servers. Lots as in hundreds.   As in
"imagine 2 racks, each with 10 4U shelves with 60 drives per shelf, with some
beefy DDN or NetApp E-series heads in front, talking to a dozen or so servers
in front of it with multiple 10GE and Infiniband links to client machines".

In other words, if you're *serious* about throughput, you're gonna need a
lot more than just a better elevator.

(For the record, a big chunk of my day job is maintaining several several
petabytes of storage for HPC users, where moving data at 3 gigabytes/second
is considered sluggish...)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 865 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130321/93e2d971/attachment-0001.bin 


More information about the Kernelnewbies mailing list