BFQ: simple elevator

Raymond Jennings shentino at gmail.com
Wed Mar 20 19:37:41 EDT 2013


On Wed, Mar 20, 2013 at 4:10 PM,  <Valdis.Kletnieks at vt.edu> wrote:
> On Wed, 20 Mar 2013 14:41:31 -0700, Raymond Jennings said:
>
>> Suppose you have requests at sectors 1, 4, 5, and 6
>>
>> You dispatch sectors 1, 4, and 5, leaving the head parked at 5 and the
>> direction as ascending.
>>
>> But suddenly, just before you get a chance to dispatch for sector 6,
>> sector 4 gets busy again.
>>
>> I'm not proposing going back to sector 4.  It's behind us and (as you
>> indicated) we could starve sector 6 indefinitely.
>>
>> So instead, because sector 4 is on the wrong side of our present head
>> position, it is ignored and we keep marching forward, and then we hit
>> sector 6 and dispatch it.
>>
>> Once we hit sector 6 and dispatch it, we do a u-turn and start
>> descending.  That's when we pick up sector 4 again.
>
> The problem is that not all seeks are created equal.
>
> Consider the requests are at 1, 4, 5, and 199343245.  If as we're servicing
> 5, another request for 4 comes in, we may well be *much* better off doing a
> short seek to 4 and then one long seek to the boonies, rather than 2 long
> seeks.
>
> My laptop has a 160G Western Digital drive in it (WD1600BJKT).  The minimum
> track-to-track seek time is 2ms, the average time is 12ms, and the maximum is
> probably on the order of 36ms. So by replacing 2 max-length seeks with a
> track-to-track seek and 1 max-length, you can almost half the delay waiting
> for seeks (38ms versus 72ms). (And even better if the target block is logically before the current one, but
> still on the same track, so you only take a rotational latency hit and no seek
> hit.
>
> (The maximum is not given in the spec sheets, but is almost always 3 times the
> average - for a discussion of the math behind that, and a lot of other issues,
> see:
>
> http://pages.cs.wisc.edu/~remzi/OSFEP/file-disks.pdf
>
> And of course, this interacts in very mysterious ways with the firmware
> on the drive, which can do its own re-ordering of I/O requests and/or
> manage the use of the disk's onboard read/write cache - this is why
> command queueing is useful for throughput, because if the disk has the
> option of re-ordering 32 requests, it can do more than if it only has 1 or
> 2 requests in the queue.  Of course, very deep command queues have their
> own issues - most notably that at some point you need to use barriers or
> something to ensure that the metadata writes aren't being re-ordered into
> a pattern that could cause corruption if the disk lost its mind before
> completing all the writes...
>
>> In my case I'm just concerned with raw total system throughput.
>
> See the above discussion.

Hmm...Maybe a hybrid approach that allows a finite number of reverse
seeks, or as I suspect deadline does a finite delay before abandoning
the close stuff to march to the boonies.

Btw, does deadline currently do ping-pong seeking or does it always go
for the nearest sector in either direction?

At any rate, I'll probably need performance tuning to get a feel for
what will work.

What I'd like to start with is correct usage of the api's in question
to actually do the processing.  Is request dispatch well documented?

My first hunch was to copy-paste from deadline and then tweak it.

Mostly I want to make something simple and learn how to write an I/O
scheduler in the process.



More information about the Kernelnewbies mailing list