System can't boot properly coz of block device state

Alvin Abitria abitria.alvin at gmail.com
Sat Aug 1 06:40:44 EDT 2015


I'm having a problem where the server system hosting a pci device
we're developing messes up in boot and can't get to the login screen.
I'm developing a linux driver for that pci device (a block device/ssd
actually).  However from our tests, when the device is has undergone
lots of usage, it initializes _very_slowly_ when the system is powered
on.  So slow that the system is already done booting but the device
init is not yet.  Of course this is in fw.  From the driver side, we
employ mechanism to hold off system's attempts to read the device
during block layer add_disk() when the module is installed IF the
device is not yet done initializing.  Once the device is ready, the
driver allows those pending reads to proceed, and that completes the
block driver module installation.  On normal cases when the device is
not yet too used up, that driver delaying tactic works during bootup -
device finishes its installation and system boots nicely and smoothly.
But when device is too used up, driver still installs itself after the
very slow device init completes, but by that time the system just
keeps cycling around some USB drivers and can't get to login screen.
Some systems already invoke their emergency mode because of too much
lag on device side.  Observed in RHEL 7 distro.

My question is: is there a way for my block driver to tell or message
the system to wait for it (the driver) so the system bootup will still
boot normally?  Like it's i'm busy, please wait for me...  Yes, the FW
must have the root cause but let's assume at this point changing it
will be very costly and very risky.  So far I've tried
blk_stop_queue() and blk_start_queue() in an attempt to hold off those
read requests during add_disk, but results are still same.
blk_requeue_request was also tried but with same results.  Those
functions are more on internal management of the request_queue, and it
seems they have nothing to do with communicating my intention to the
system. I know I'm asking for a plug fix and not the cleanest solution
but I want your thoughts if this is possible.

Thanks!



More information about the Kernelnewbies mailing list