BSOD // WinDBG log inside

Status
Not open for further replies.

m0h1t

drinks like a fish
Innovator
Hey guys, cant seem to solve this issue by myself.

Please advise.

I am getting BSOD during startup, either during logon where one puts in the password or within 2 minutes after login.

This doesn't happen every time.

My current OC- 375*9.5@1.2750v + Ram at 900Mhz 4-4-4-12 @ 2.1v + NB @ 1.3v.

I have succesfully stress tested my overclock, 7 Hours OCCT and 4 hours memtest86+(6 loops).

All temps are within limits with no errors.

I got 3 BSOD consecutively on each restart after completing the 4 hours of memory testing. No bsod on 4th restart and after that.

Later today I got another bsod on a cold boot, restarted and everything ran fine later.

I was having a similar issue in Vista Ultimate x64 before i decided to try Win7.

Below is my Windows debugger(x64) log.

Please help me identify whats causing the BSOD.

Microsoft ® Windows Debugger Version 6.11.0001.402 AMD64

Copyright © Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Windows\Minidump\030109-17234-01.dmp]

Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: SRV*c:\symbols*Symbol information

Executable search path is:

Windows 7 Kernel Version 7000 MP (2 procs) Free x64

Product: WinNt, suite: TerminalServer SingleUserTS

Built by: 7000.0.amd64fre.winmain_win7beta.081212-1400

Machine Name:

Kernel base = 0xfffff800`02a03000 PsLoadedModuleList = 0xfffff800`02c51050

Debug session time: Sun Mar 1 05:08:21.322 2009 (GMT+6)

System Uptime: 0 days 0:00:30.494

Loading Kernel Symbols

...............................................................

................................................................

..................

Loading User Symbols

Loading unloaded module list

....

*******************************************************************************

* *

* Bugcheck Analysis *

* *

*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 124, {0, fffffa8004f10038, b6200006, 80a01}

Probably caused by : hardware

Followup: MachineOwner

---------

0: kd> !analyze -v

*******************************************************************************

* *

* Bugcheck Analysis *

* *

*******************************************************************************

WHEA_UNCORRECTABLE_ERROR (124)

A fatal hardware error has occurred. Parameter 1 identifies the type of error

source that reported the error. Parameter 2 holds the address of the

WHEA_ERROR_RECORD structure that describes the error conditon.

Arguments:

Arg1: 0000000000000000, Machine Check Exception

Arg2: fffffa8004f10038, Address of the WHEA_ERROR_RECORD structure.

Arg3: 00000000b6200006, High order 32-bits of the MCi_STATUS value.

Arg4: 0000000000080a01, Low order 32-bits of the MCi_STATUS value.

Debugging Details:

BUGCHECK_STR: 0x124_GenuineIntel

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT

CURRENT_IRQL: 0

STACK_TEXT:

fffff800`03fe8a98 fffff800`02ff4063 : 00000000`00000124 00000000`00000000 fffffa80`04f10038 00000000`b6200006 : nt!KeBugCheckEx

fffff800`03fe8aa0 fffff800`02b8f2c3 : 00000000`00000001 fffffa80`04f10ea0 00000000`00000000 fffffa80`04f10ef0 : hal!HalBugCheckSystem+0x1e3

fffff800`03fe8ae0 fffff800`02ff3d28 : 00000000`00000728 fffffa80`04f10ea0 fffff800`03fe8e70 fffff800`03fe8e00 : nt!WheaReportHwError+0x263

fffff800`03fe8b40 fffff800`02ff36af : fffffa80`04f10ea0 00000000`00000000 fffffa80`04f10ea0 fffff800`03fe8e70 : hal!HalpMcaReportError+0x4c

fffff800`03fe8c90 fffff800`02ff3578 : 00000000`00000002 00000000`00000001 fffff800`03fe8ef0 00000000`00000000 : hal!HalpMceHandler+0x90

fffff800`03fe8cd0 fffff800`02fe7d50 : 00000000`00000000 00000000`00000001 00000000`00000000 00000000`00000000 : hal!HalpMceHandlerWithRendezvous+0x55

fffff800`03fe8d00 fffff800`02a9d1ec : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : hal!HalHandleMcheck+0x40

fffff800`03fe8d30 fffff800`02a9d053 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxMcheckAbort+0x6c

fffff800`03fe8e70 fffff880`034db842 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x153

fffff800`03fdcc68 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : intelppm!C1Halt+0x2

STACK_COMMAND: kb

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: hardware

IMAGE_NAME: hardware

DEBUG_FLR_IMAGE_TIMESTAMP: 0

FAILURE_BUCKET_ID: X64_0x124_GenuineIntel__UNKNOWN

BUCKET_ID: X64_0x124_GenuineIntel__UNKNOWN

Followup: MachineOwner

---------

Right now i'm running at stock clocks (266*9.5 + Ram at 800Mhz) to see if i get any bsod.

TIA

 
This is a machine check exception. It has nothing to do with Windows or drivers or software - it occurs when hardware detects (yes, your CPU is detecting this) an unrecoverable error (memory corruption, or fatal processor errors).

Two things:

1. Remove the OC. Bring it back down to stock levels.

2. Run memtest (I don't think this is really relevant, but I haven't learned all the details of MCA yet).

For more details, download and read the chapter on machine check architecture from "Intel® 64 and IA-32 Architectures Software Developer's Manual - Volume 3A: System Programming Guide"

http://download.intel.com/design/processor/manuals/253668.pdf
 
This is what one one the guys at bleepingcomputer's had to say

WHEA_UNCORRECTABLE_ERROR -- Processor reporting error: Machine Processor could not write to the system log

CURRENT_IRQL: 0 -- Processor cannot communicate with the RPC service: FATAL ERROR 0 IRQL

FAILURE_BUCKET_ID: X64_0x124_GenuineIntel__UNKNOWN <----This is the name of your Processor and the error it produced

BUCKET_ID: X64_0x124_GenuineIntel__UNKNOWN <----This is the name of your Processor and the error it produced

This may be a processor issue. You may want to get your processor tested at a local computer repair shop!

I'm already running the cpu on stock since yesterday evening, lets see if get I anymore bluescreens...

The memory is fine. 4 Hours memtest86 would have reported error's if something was wrong.

So should I get the CPU checked with intel or what? I'm sure there's no physical damage to the chip, since the cpu temp has never crossed the 65*c Mark

Also if this is a cpu issue then how come I'm able to run the stress test for 7 hours without any error's?

@Mods - please move the thread to h/w section.
 
Download the manual I linked to (if you feel like it). Chapter 14 describes the format of MCi_Status (and hence what those error codes mean).
 
  • Like
Reactions: 1 person
Status
Not open for further replies.