Возникла проблема,время от времени сервер уходит в BSOD и самопроизвольно перегружается
superMicro X8DTN+F, Xeon 5680
os: Win 2008 std r2 (64bit) SP 2
Анализ memory.dmp показал следующие
В журнале событий зафиксированы след. событияMicrosoft (R) Windows Debugger Version 6.12.0002.633 AMD64
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
Use !analyze -v to get detailed debugging information.
BugCheck 124, {0, fffffa8013486028, b2000000, 175}
*** ERROR: Module load completed but symbols could not be loaded for intelppm.sys
Probably caused by : hardware
Followup: MachineOwner
---------
11: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: fffffa8013486028, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000b2000000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000000000175, Low order 32-bits of the MCi_STATUS value.
Debugging Details:
------------------
BUGCHECK_STR: 0x124_GenuineIntel
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: f
STACK_TEXT:
fffff880`02265c58 fffff800`0161da3b : 00000000`00000124 00000000`00000000 fffffa80`13486028 00000000`b2000000 : nt!KeBugCheckEx
fffff880`02265c60 fffff800`017e1513 : 00000000`00000001 fffffa80`13282890 00000000`00000000 fffffa80`132828e0 : hal!HalBugCheckSystem+0x1e3
fffff880`02265ca0 fffff800`0161d700 : 00000000`00000728 fffffa80`13282890 fffff880`02266030 fffff880`02266000 : nt!WheaReportHwError+0x263
fffff880`02265d00 fffff800`0161d052 : fffffa80`13282890 fffff880`02266030 fffffa80`13282890 00000000`00000000 : hal!HalpMcaReportError+0x4c
fffff880`02265e50 fffff800`0161cf0d : 00000000`0000000c 00000000`00000001 fffff880`022660b0 00000000`00000000 : hal!HalpMceHandler+0x9e
fffff880`02265e90 fffff800`01610e88 : 00000000`00000002 fffff880`0225d180 00000000`00000000 00000000`00000000 : hal!HalpMceHandlerWithRendezvous+0x55
fffff880`02265ec0 fffff800`016cf52c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : hal!HalHandleMcheck+0x40
fffff880`02265ef0 fffff800`016cf393 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxMcheckAbort+0x6c
fffff880`02266030 fffff880`03002c61 : fffff800`016da4a9 00000000`0031ac37 fffffa80`1374dcd8 fffff880`022681c0 : nt!KiMcheckAbort+0x153
fffff880`02285c98 fffff800`016da4a9 : 00000000`0031ac37 fffffa80`1374dcd8 fffff880`022681c0 00000000`00000001 : intelppm+0x2c61
fffff880`02285ca0 fffff800`016c893c : fffff880`0225d180 fffff880`00000002 00000000`00000002 fffff800`00000000 : nt!PoIdle+0x52a
fffff880`02285d80 00000000`00000000 : fffff880`02286000 fffff880`02280000 fffff880`02285d40 00000000`00000000 : nt!KiIdleLoop+0x2c
STACK_COMMAND: kb
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: hardware
IMAGE_NAME: hardware
DEBUG_FLR_IMAGE_TIMESTAMP: 0
FAILURE_BUCKET_ID: X64_0x124_GenuineIntel_PROCESSOR_CACHE
BUCKET_ID: X64_0x124_GenuineIntel_PROCESSOR_CACHE
Followup: MachineOwner
---------
source:WHEA-Logger
eventid:18
User: Local Service
A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor ID: 21
Может кто сталкивался с такой проблемой ?The computer has rebooted from a bugcheck. The bugcheck was: 0x00000124 (0x0000000000000000, 0xfffffa8013486028, 0x00000000b2000000, 0x0000000000000175).
Превышение температурного порога не зафиксированно, хотя ipmi пишет что был перегрев на втором процессоре, но проблема в том cамого процессора нет прошивка для ipmi стоит последняя (R 2.27) Нагрузка на процессоре находиться в пределах 40 %