BFQ: simple elevator

Thu Mar 21 05:37:41 EDT 2013

On Thu, Mar 21, 2013 at 2:13 AM,  <Valdis.Kletnieks at vt.edu> wrote:
> On Wed, 20 Mar 2013 16:37:41 -0700, Raymond Jennings said:
>
>> Hmm...Maybe a hybrid approach that allows a finite number of reverse
>> seeks, or as I suspect deadline does a finite delay before abandoning
>> the close stuff to march to the boonies.
>
> Maybe. Maybe not.  It's going to depend on the workload - look how many times
> we've had to tweak something as obvious as cache writeback to get it to behave
> for corner cases.  You'll think you got the algorithm right, and then the next
> guy to test-drive it will do something only 5% different and ends up cratering
> the disk. :)

Which is one reason I suspect this is why you have both competing
schedulers co-existing in the same kernel as well as tunables for
each.  Namely that there isn't really a one size fits all solution.

> Now of course, the flip side of "a disk's average seek time is between 5ms and
> 12ms depending how much you paid for it" is that there's no spinning disk on
> the planet that can do much more than 200 seeks per second (oh, and before you
> knee-jerk and say "SSD to the rescue", that's got its own issues). Right now,
> you should be thinking "so *that* is why xfs and ext4 do extents - so we can
> keep file I/O as sequential as possible with as few seeks as possible". Other
> things you start doing if you want *real* throughput: you start looking at
> striped and parallel filesystems, self-defragmenting filesystems,
> multipath-capable disk controllers, and other stuff like that to spread the I/O
> across lots of disks fronted by lots of servers. Lots as in hundreds.   As in
> "imagine 2 racks, each with 10 4U shelves with 60 drives per shelf, with some
> beefy DDN or NetApp E-series heads in front, talking to a dozen or so servers
> in front of it with multiple 10GE and Infiniband links to client machines".
>
> In other words, if you're *serious* about throughput, you're gonna need a
> lot more than just a better elevator.

I suspect that good old fashioned "learning from experience" will get
my feet wet.

> (For the record, a big chunk of my day job is maintaining several several
> petabytes of storage for HPC users, where moving data at 3 gigabytes/second
> is considered sluggish...)

My day job involves receiving disability from the feds for having
autism, so I have almost nothing but time on my hands.

At any rate I suppose the best way to get started on this is to get a
grip on the api's involved in receiving requests from above and
dispatching them below.  Is studying the deadline scheduler a good
start or would I be better off looking at documentation?