Suspend/resume causes I2C issues

Magnus Olsson magnus at minimum.se
Wed May 1 15:39:47 EDT 2019


Hey,

I'm trying to debug an issue on an embedded Linux (v4.4.107) where 
repeatedly and quickly suspending (mem suspend) and resuming the system 
causes one of the I2C controllers in the system to malfunction. I've run 
out of ideas and would like to know if anyone recognizes my issue or can 
provide clues to move forward with the debug.

Background: My system can be suspended and resumed using two buttons. 
The buttons are attached to a GPIO expander, which in turn is connected 
to the SoC via an I2C bus. The wake up button act as a wake-up source 
for the kernel. When a button is pressed, the GPIO expander tiggers an 
interrupt and the SoC will access the I2C bus to read out what button 
was pressed. If I mash the buttons like a 2-year old would, it'll 
eventually (within a minute or so) fail to suspend the system with an 
error from the kernel "PM: noirq suspend of devices failed". Just before 
this happens, I also see "controller timed out" errors coming from the 
I2C controller driver in the kernel log. The device that fails to 
suspend is the GPIO expander device and if I understand the kernel code 
correctly, it is because an IRQ arrived just at the moment when suspend 
is in progress. So it tries to process the IRQ before going to sleep, 
but fails because the I2C controller is no longer working, so it is 
unable to serve the IRQ and aborts suspend and the system is resumed. In 
a way this is correct behaviour, the kernel is going to sleep but 
receives an IRQ from the wake up source and then aborts the suspend. 
BUT, it does not explain why the controller gets timeouts and why it 
only happens sometimes. If I more gently suspend and resume (e.g no 
spamming of buttons), it works great.

What is odd is that once the system is resumed again, the I2C controller 
starts working again. But if I keep repeating the same procedure, the 
system is no longer able to suspend -- the fail to suspend happens every 
time and the system cannot go to sleep. Which is a disaster because this 
is a battery-powered device. What's even worse is that sometimes the 
GPIO expander stops working altogether, likely because it is a 
IRQF_ONESHOT irq and when we are unable to process the IRQs (due to 
broken I2C controller), it doesn't re-enable the IRQ anymore. I've been 
able to verify this by successfully sending i2c messages from the cli to 
the ADP5589 to poll its status, while IRQs from it is not arriving to 
IRQ handler.

For reference, the I2C controller I'm using is Designware I2C. The 
driver is drivers/i2c/busses/i2c-designware-*. The GPIO expander is a 
ADP5589 and the driver I'm using is 
drivers/input/keyboard/adp5589-keys.c. When the issue occurs, the 
controller timeout 
(https://elixir.bootlin.com/linux/v4.4.107/source/drivers/i2c/busses/i2c-designware-core.c#L659) 
happens because an ongoing I2C transmit (as requested by the ADP5589 irq 
handler) does not finish within 1 second.

I have connected a logic analyzer to the I2C pins and when the 
controller timeout happens, I see that both SDA and SCL are pulled low. 
They are kept low until the system is resumed and the controller 
recovers. At first I thought this issue was a i2c bus fault, so I tried 
implementing i2c bus recovery by remuxing the SDA and SCL pins to the 
GPIO controller and then pulsing the SCL. However, as soon as I remux 
the pins, the SCL and SDA are no longer getting pulled low. To me this 
indicates that it is not one of the slaves that are hogging the bus, it 
is the master. I can also tell from the controller status registers that 
when the controller timeout occurs, the controller is not in an idle 
state but it is also not getting the STOP bit interrupt nor anything 
that would "complete" the transfer. It's stuck. I have looked upstream 
in more recent kernels than 4.4 for fixes that would resolve this (and 
there are quite a few commits that mention "controlled timed out" for 
the designware driver), but so far nothing have worked.

Not even if I reset the whole controller (from the SoC syscontrol), it 
will work until the system is fully resumed. Queuing new transactions 
before system is suspended only makes the controller time out again. 
This makes me wonder: what other part of the system gets suspended that 
makes the i2c controller malfunction? And why does it not always happen? 
Is not the suspend sequence executed the same way every time? (e.g order 
of suspend)

Questions:

- If I call enable_irq_wake() on an IRQ, the IRQ should remain ON even 
if the system is suspended. Will the kernel ensure all parent devices 
are awaken before it invokes the device interrupt handler to serve a 
wake up IRQ?If I put printk's in the kernel suspend code, it seems to me 
that the ISR is called when more or less everything else is suspended / 
turned off.

- I've tried to modify the I2C controller driver so that it never goes 
to sleep, just as an experiment. I just set the PM ops to NULL and 
changed the request_irq flags to IRQF_NO_SUSPEND; is this sufficient to 
prevent the device from going to sleep?

If anyone have ideas on how to debug this issue, I'd greatly appreciate it.


Best regards, Magnus.





More information about the Kernelnewbies mailing list