CPU/Mobo Weird problem with system hang every minute

lecameleon

Disciple
Hi, not sure if this is the right forum to post this, but I am simply lost at the moment. I have a KGPE-D16 Asus mobo with dual Opteron 6128 and 32 GB ECC regd DDRIII.

I get a system error every minute endlessly. In the windows (7/ultimate) eventlog I get an event ID 47 and it shows up like this:

********************************

A corrected hardware error has occurred.

Component: Memory

Error Source: Corrected Machine Check

********************************

There is simply no problem with the RAM, I have checked several times (diagnostics, swapping etc). Every time this event occurs, everything grinds to a halt for 2-3 seconds. Audio/video stutters and nothing else works. Every minute, and this has been going on for a long time now. The weird part is, I installed windows on another disk and the same error is replicated.

I thought it was probably the ECC thing that caused this. Tried disabling ECC and the system wouldn't boot (Some 0xC000000f or something error). The details in the system eventlog shows up as Error Type 0 which is "unknown error", for Module 0x0.

On some online searches I got pages that described similar errors but with different event IDs. Nowhere was the error resolved, though. I find it maddening and I'd appreciate some help from people who know better.

Someone said it was probably an Nvidia driver error. Tried that too. Installed different drivers for the Quadro. Then there was supposedly a heating issue (how? I find that weird). That was another suggestion that was nixed. So what might be causing this error? Help.
 
Yes, thanks for helping out. I have seen a few dozen similar threads, again with different event Ids and not much explanation either. I get an event ID 47 from the WHEA Logger. Related to memory, as I mentioned earlier.

On an HP support site I saw the problem replicated exactly like mine. The people there mentioned the ECC settings. Not sure what else I could do. Still going on and it is bugging the hell out of me.
 
Oh, probably. One of the suggestions was to upgrade the bios and I did that. But that is a bit inverted, isn't it? I upgraded the bios after this problem started. Let me check, though. Good point. Thanks, man.

EDIT: :(( Nope. One more better idea goes out.
 
Did you try running a Linux live cd to see if it is OS related or an hardware problem? You could try sfc /scannow to see if any system files are corrupt.
 
Did that... installed windows afresh on an SSD and tried running. Same error. So my conclusion is that it is indeed a hardware or bios thing. Probably a bios setting that I am unaware of. Or maybe I should try disabling the other PCI-e slots. Exactly a minute apart, so I am wondering what would take a minute to be replicated each time. No fresh ideas and running out of options. Stress tests should probably have shown some kind of fault if it were purely a hardware error. No stop errors, no BSODs, nothing, just this irritating 2-3 seconds freeze every minute. Frustrating.
 
lecameleon said:
CPU at 100% load is around 52 and GPU (Quadro 2000) runs at around 54. You think these are high?
Those are fine. Weird problem you have. How did you check your RAM for fault. Did you Memtest it on DOS...? Also try a clean sweep, using Driver Sweeper for nVidia drivers, and remount. Follow the link. Just google "driver sweeper+guru3d". Use those instructions.
 
Kippy, long time :eek:hyeah: ..... dust shorting my hard drive every minute? :bleh:
Just kidding, man. This is a new system, no chance with the dust shorting etc. In any case, it shows up as a memory error in the event log. But the module is unknown and error source is unknown.

Thanks for the help, everyone. I appreciate it. However, I am disinclined to think that this is some driver error. Like I mentioned in earlier posts. I went ahead with a fresh install of windows on a new SSD and I still got the same event. So I am at a loss, completely. I have gone over the Asus manual a million times already and looked at online descriptions of similar errors, no real solution yet.

(Kippy you traitor, sitting in Telangana when the fun is just starting back home, Thirumathi nee Selvi JJ is going back to Poes... hehehehe). TTYL

Thanks guys, I will update you once I have worked out a solution.

--- Updated Post - Automerged ---

Additional question: Why can I not boot into windows with ECC disabled? Strange. It stops with an error (0xC00000something).

Does this indicate that the RAM slots on the board may be defective? Or some problem with the ECC itself?
 
Dude, I am not very sure but some hardware component may be loose in your system. I got a similar error in my PIII system (Memory parity error). You can try removing any add-on PCI or PCI-E cards and then check after rebooting. I swapped my two cards with different PCI slots and it it working fine now :)
 
Back
Top