Best way to debug an Intel Core i5 hang - likely graphics (possibly power) related

Fredrick fjohnber at zoho.com
Tue Jan 31 01:14:03 EST 2012


On 01/30/2012 08:52 PM, Graeme Russ wrote:
> Hi Mulyadi,
>
> On Tue, Jan 31, 2012 at 3:25 PM, Mulyadi Santosa
> <mulyadi.santosa at gmail.com>  wrote:
>> Hi :)
>>
>> On Tue, Jan 31, 2012 at 10:00, Graeme Russ<graeme.russ at gmail.com>  wrote:
>>> I _think_ I've solved the problem - SDRAM Voltage
>>
>> You got my respect man, you're really stubborn :)
>>
>>> The SDRAM I am using has a rated operating voltage of 1.5V +/- 0.075.
>>> It looked like the motherboard BIOS had decided to use the upper limit
>>> of 1.575V when set to 'Auto'. I changed it to 'Manual' and set the
>>> SDRAM voltage to 1.5V and it's been running stably for the longest
>>> time it ever has.
>>
>> Thanks (again) for sharing. So this indeed has tight relationship with
>> RAM "misbehaviour". How do you know it? Do you inspect every piece of
>> your hardware? I am curious to know (maybe others too).
>
> The first symptom was that the screen would cycle through solid colour, so
> naturally the video 'card' was the first to be blamed. Of course, the i5
> has the video built into the CPU, so the likelihood of a fault there is
> probably minimal, so the graphics driver was next in line
>
> So I installed an nVidia 8600GT and ran the nouveau driver (now I did get
> a glitch using this combo, but it wasn't a hang so I set that aside as a
> driver bug as well... could be related)
>
> I then installed an nVidia G210 (it's a much smaller and quieter card). I
> experienced one hang with this combination (right, now things are getting
> interesting...)
>
> In the meantime, I had tried fiddling with the IGPU voltage offset - no
> luck of course
>
> I removed my Linux hard drives and installed a spare hard drive and
> proceeded to install Windows 7 (using the on-chip Intel graphics). The
> machine hung once before the Window 7 drivers were installed (promising)
>
> I then installed the Windows 7 drivers and started downloading 3DMark 2006
>
> ...Off to Australia Day Lunch with friends, back later...
>
> OK, so 3DMark downloaded OK and the machine was still running some 6 hours
> later :(
>
> Before getting a chance to install 3DMark, I had some other things to
> attend to... Glancing over bright flashing colours!!! Linux had been
> exonerated :)
>
> So I took it back to the shop I bought it from (long argument about voiding
> the warranty by taking of the cover blah blah blah). They ran a stress
> test without failure. I suggested they run memtest which was met by 'Ah,
> yeah, I should have thought of that first' (and _I_ voided the warranty!)
>
> So memtest failed, they put in another pair of memory modules and memtest
> failed again. Now the plot thickens... They put the old memory back and
> memtest passed! (what the!) then the put the new memory in and, you guessed
> it, memtest passed! So the old memory goes back in and more stress testing
> begins.
>
> It was run all day, no failure. So I went in and picked up the machine to
> take back home on the assumption that the problem was the seating of the
> memory modules - well I couldn't really fault that analysis (another
> argument about voiding warranty, 'parts still in warranty, labour to run
> the tests not', and 'Oh, it failed under Linux, must be software related,
> not covered by warrantly' Me: 'It failed before I opened the case',
> Them: 'doesn't matter, you opened the case') - Anyway, I got it back
> without paying anything mumbling 'idiots' under my breath...
>
> so I put my Linux drives back in and run it over night. It survived and so
> I thought the problem was solved but alas, it failed ten minutes after
> waking it up in the morning... bugger!
>
> So RAM modules not the problem, that leaves CPU, Motherboard and PSU...
>
> So I switched out the PSU - Fail (really quickly this time... interesting)
>
> So that's when I decided to look at the SDRAM voltage - I looked up the
> datasheet for the RAM and compared it to the BIOS setting... Hmm, right
> at the upper limit of the spec'd DIMM voltage, so I set it to 1.5V
> manually.
>
> Since then it has not skipped a beat (only been ~18 hours, but that's way
> longer than previously)
>
> Now if it fails again, I'm just going to buy another motherboard. If that
> works, I'm going to have a _very_ interesting time with the shop I
> bought it from (after all, the parts are under warranty hardy, har har!)
>
>> NB: it could be a good lesson that system lock up might have
>> absolutely nothing to do with kernel.
>
> Verily :)
>
> Regards,
>
> Graeme
>

Thank you Graeme for sharing this experience. Amazing persistence! I 
would not have gone this far. :) Sometimes you have to doubt even the 
nuts and bolts :)

-Fredrick

> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies






More information about the Kernelnewbies mailing list