PCI: hotplug_event: PCIe PLDA Device BAR Reset
    Bjorn Helgaas 
    helgaas at kernel.org
       
    Wed Feb 19 12:06:40 EST 2025
    
    
  
[+cc linux-acpi]
On Wed, Feb 19, 2025 at 05:52:47PM +0530, Naveen Kumar P wrote:
> Hi all,
> 
> I am writing to seek assistance with an issue we are experiencing with
> a PCIe device (PLDA Device 5555) connected through PCI Express Root
> Port 1 to the host bridge.
> 
> We have observed that after booting the system, the Base Address
> Register (BAR0) memory of this device gets reset to 0x0 after
> approximately one hour or more (the timing is inconsistent). This was
> verified using the lspci output and the setpci -s 01:00.0
> BASE_ADDRESS_0 command.
> 
> To diagnose the issue, we checked the dmesg log, but it did not
> provide any relevant information. I then enabled dynamic debugging for
> the PCI subsystem (drivers/pci/*) and noticed the following messages
> related ACPI hotplug in the dmesg log:
> 
> [    0.465144] pci 0000:01:00.0: reg 0x10: [mem 0xb0400000-0xb07fffff]
> ...
> [ 6710.000355] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> [ 7916.250868] perf: interrupt took too long (4072 > 3601), lowering
> kernel.perf_event_max_sample_rate to 49000
> [ 7984.719647] perf: interrupt took too long (5378 > 5090), lowering
> kernel.perf_event_max_sample_rate to 37000
> [11051.409115] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> [11755.388727] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> [12223.885715] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> [14303.465636] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event()
> After these messages appear, reading the device BAR memory results in
> 0x0 instead of the expected value.
> 
> I would like to understand the following:
> 
> 1. What could be causing these hotplug_event debug messages?
This is an ACPI Notify event.  Basically the platform is telling us to
re-enumerate the hierarchy below RP01 because a device might have been
added or removed.
Unfortunately the only real information we get is the ACPI device
(RP01) and the notification value (ACPI_NOTIFY_BUS_CHECK).
You could instrument acpiphp_check_bridge() to see what path we take.
The main paths look like enable_slot() or disable_slot(), but those
both include a pr_debug() than you apparently don't see.
A remove followed by add would definitely reset the device, including
its BARs.  But you would normally see some messages related to
enumerating a new device.
If this doesn't help, try to reproduce the problem with a recent
kernel, e.g., v6.13, and post the complete dmesg log.
> 2. Why does this result in the BAR memory being reset?
> 3. How can we resolve this issue?
> 
> I have verified that the issue occurs even without loading the driver
> for the PLDA Device 5555, so it does not appear to be related to the
> device driver.
> 
> Any help or guidance on debugging this issue would be greatly appreciated.
> 
> Thank you for your assistance.
> 
> Best regards,
> Naveen
    
    
More information about the Kernelnewbies
mailing list