Best way to debug an Intel Core i5 hang - likely graphics (possibly power) related

Graeme Russ graeme.russ at gmail.com
Mon Jan 30 23:52:21 EST 2012


Hi Mulyadi,

On Tue, Jan 31, 2012 at 3:25 PM, Mulyadi Santosa
<mulyadi.santosa at gmail.com> wrote:
> Hi :)
>
> On Tue, Jan 31, 2012 at 10:00, Graeme Russ <graeme.russ at gmail.com> wrote:
>> I _think_ I've solved the problem - SDRAM Voltage
>
> You got my respect man, you're really stubborn :)
>
>> The SDRAM I am using has a rated operating voltage of 1.5V +/- 0.075.
>> It looked like the motherboard BIOS had decided to use the upper limit
>> of 1.575V when set to 'Auto'. I changed it to 'Manual' and set the
>> SDRAM voltage to 1.5V and it's been running stably for the longest
>> time it ever has.
>
> Thanks (again) for sharing. So this indeed has tight relationship with
> RAM "misbehaviour". How do you know it? Do you inspect every piece of
> your hardware? I am curious to know (maybe others too).

The first symptom was that the screen would cycle through solid colour, so
naturally the video 'card' was the first to be blamed. Of course, the i5
has the video built into the CPU, so the likelihood of a fault there is
probably minimal, so the graphics driver was next in line

So I installed an nVidia 8600GT and ran the nouveau driver (now I did get
a glitch using this combo, but it wasn't a hang so I set that aside as a
driver bug as well... could be related)

I then installed an nVidia G210 (it's a much smaller and quieter card). I
experienced one hang with this combination (right, now things are getting
interesting...)

In the meantime, I had tried fiddling with the IGPU voltage offset - no
luck of course

I removed my Linux hard drives and installed a spare hard drive and
proceeded to install Windows 7 (using the on-chip Intel graphics). The
machine hung once before the Window 7 drivers were installed (promising)

I then installed the Windows 7 drivers and started downloading 3DMark 2006

...Off to Australia Day Lunch with friends, back later...

OK, so 3DMark downloaded OK and the machine was still running some 6 hours
later :(

Before getting a chance to install 3DMark, I had some other things to
attend to... Glancing over bright flashing colours!!! Linux had been
exonerated :)

So I took it back to the shop I bought it from (long argument about voiding
the warranty by taking of the cover blah blah blah). They ran a stress
test without failure. I suggested they run memtest which was met by 'Ah,
yeah, I should have thought of that first' (and _I_ voided the warranty!)

So memtest failed, they put in another pair of memory modules and memtest
failed again. Now the plot thickens... They put the old memory back and
memtest passed! (what the!) then the put the new memory in and, you guessed
it, memtest passed! So the old memory goes back in and more stress testing
begins.

It was run all day, no failure. So I went in and picked up the machine to
take back home on the assumption that the problem was the seating of the
memory modules - well I couldn't really fault that analysis (another
argument about voiding warranty, 'parts still in warranty, labour to run
the tests not', and 'Oh, it failed under Linux, must be software related,
not covered by warrantly' Me: 'It failed before I opened the case',
Them: 'doesn't matter, you opened the case') - Anyway, I got it back
without paying anything mumbling 'idiots' under my breath...

so I put my Linux drives back in and run it over night. It survived and so
I thought the problem was solved but alas, it failed ten minutes after
waking it up in the morning... bugger!

So RAM modules not the problem, that leaves CPU, Motherboard and PSU...

So I switched out the PSU - Fail (really quickly this time... interesting)

So that's when I decided to look at the SDRAM voltage - I looked up the
datasheet for the RAM and compared it to the BIOS setting... Hmm, right
at the upper limit of the spec'd DIMM voltage, so I set it to 1.5V
manually.

Since then it has not skipped a beat (only been ~18 hours, but that's way
longer than previously)

Now if it fails again, I'm just going to buy another motherboard. If that
works, I'm going to have a _very_ interesting time with the shop I
bought it from (after all, the parts are under warranty hardy, har har!)

> NB: it could be a good lesson that system lock up might have
> absolutely nothing to do with kernel.

Verily :)

Regards,

Graeme



More information about the Kernelnewbies mailing list