Message boards : Questions and problems : BSOD
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
Hi, Been using BOINC for well over 5 years now. Have had nothing but BSOD with every version I have tried on Win 7 Ultimate x64 from 6.10.18 through 6.10.34 Machine does not BSOD with any other program loaded or running. Plenty of power and cooling. Have tested memory and CPU and no issues there. Have tired with cc_config set to No CPU usage and without the cc_config. BOINC begins to process and after about 5 minutes BSOD and reboot. Did not happen with Win 7 RTM x64 on same hardware and does not happen on Vista x32 machine in same network. Any thoughts? |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
Here is a copy of the debug dump. As stated before, have had hardware components tested, and latest drivers for all components. Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [C:\Windows\Minidump\030210-21964-01.dmp] Mini Kernel Dump File: Only registers and stack trace are available Symbol search path is: srv* Executable search path is: Windows 7 Kernel Version 7600 MP (8 procs) Free x64 Product: WinNt, suite: TerminalServer SingleUserTS Built by: 7600.16385.amd64fre.win7_rtm.090713-1255 Machine Name: Kernel base = 0xfffff800`02c4d000 PsLoadedModuleList = 0xfffff800`02e8ae50 Debug session time: Tue Mar 2 15:46:21.756 2010 (UTC - 8:00) System Uptime: 0 days 0:19:59.755 Loading Kernel Symbols ............................................................... ................................................................ ...................................... Loading User Symbols Loading unloaded module list ...... ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* Use !analyze -v to get detailed debugging information. BugCheck 124, {0, fffffa800dcc3028, fe000000, 40015a} Probably caused by : hardware Followup: MachineOwner --------- 5: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* WHEA_UNCORRECTABLE_ERROR (124) A fatal hardware error has occurred. Parameter 1 identifies the type of error source that reported the error. Parameter 2 holds the address of the WHEA_ERROR_RECORD structure that describes the error conditon. Arguments: Arg1: 0000000000000000, Machine Check Exception Arg2: fffffa800dcc3028, Address of the WHEA_ERROR_RECORD structure. Arg3: 00000000fe000000, High order 32-bits of the MCi_STATUS value. Arg4: 000000000040015a, Low order 32-bits of the MCi_STATUS value. Debugging Details: ------------------ BUGCHECK_STR: 0x124_GenuineIntel CUSTOMER_CRASH_COUNT: 1 DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT PROCESS_NAME: metropolis_3.1 CURRENT_IRQL: f STACK_TEXT: fffff880`03290b58 fffff800`02c16903 : 00000000`00000124 00000000`00000000 fffffa80`0dcc3028 00000000`fe000000 : nt!KeBugCheckEx fffff880`03290b60 fffff800`02dd3513 : 00000000`00000001 fffffa80`0a2e9650 00000000`00000000 fffffa80`0a2e96a0 : hal!HalBugCheckSystem+0x1e3 fffff880`03290ba0 fffff800`02c165c8 : 00000000`00000728 fffffa80`0a2e9650 fffff880`03290f30 fffff880`03290f00 : nt!WheaReportHwError+0x263 fffff880`03290c00 fffff800`02c15f1a : fffffa80`0a2e9650 fffff880`03290f30 fffffa80`0a2e9650 00000000`00000000 : hal!HalpMcaReportError+0x4c fffff880`03290d50 fffff800`02c15dd5 : 00000000`00000008 00000000`00000001 fffff880`03290fb0 00000000`00000000 : hal!HalpMceHandler+0x9e fffff880`03290d90 fffff800`02c09e88 : 00000000`00000000 00000000`00000001 00000000`00000000 00000000`00000000 : hal!HalpMceHandlerWithRendezvous+0x55 fffff880`03290dc0 fffff800`02cbd7ac : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : hal!HalHandleMcheck+0x40 fffff880`03290df0 fffff800`02cbd613 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxMcheckAbort+0x6c fffff880`03290f30 00000000`0040b32d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x153 00000000`028dfebc 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x40b32d STACK_COMMAND: kb FOLLOWUP_NAME: MachineOwner MODULE_NAME: hardware IMAGE_NAME: hardware DEBUG_FLR_IMAGE_TIMESTAMP: 0 FAILURE_BUCKET_ID: X64_0x124_GenuineIntel_PROCESSOR_CACHE BUCKET_ID: X64_0x124_GenuineIntel_PROCESSOR_CACHE Followup: MachineOwner --------- |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
Could you please post what the BSOD says? Is it WHEA_UNCORRECTABLE_ERROR? Could you also please post your system specifications? Could you also please state which projects you are attached to and with which you see the error happen mostly? (I don't know which project runs Metropolis). So far though it points in the direction of a hardware error. CPU/FPU or incorrect driver for some hardware. Specifically due to what the debug says. Probably caused by : hardware Are you running with an Nvidia CUDA GPU? If so, with what driver set? Have you tried a previous driver set? Are you running with Windows 7 drivers only? |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
Hi, Thanks for the reply. I have tried all drivers from the WDM/Windows native drivers, to everything from the 191 series to the latest 196.75 drivers. All with the same results. Since I have tried with No GPU's used through the cc_config.xml as well as with the GPU's enabled with the same results, I am wondering why the video driver would be of concern. Is the app still trying to use some portion of the video even when told not to? I agree the debug points to hardware, but having all hardware tested, I disagree. It's wierd, the only time I have had any issue with this machine is when I try and run BOINC and it's subsequent science apps. The BSOD says exactly what the test in the previous post shows. That is printout of the .dmp. I am attaching to Rosetta, World Community Grid, Spinhenge, Climate Prediction, and SETI. System Specs: ASUS P67T WS Motherboard Intel i7 975 Extreme Processor (No overclocking) 12 GB G. Skill 16000CL9T DDR3 Memory (Running at 2000 per XMP) 3x XFX 280GTX XXX video cards in SLI 4x Seagate 500GB ST3500630NS in RAID 0+1 from ICH10R on MB PC Power&Cooling 1200W PSU (Single Rail 100A) Lian-Li (Lancool) PC-K62 Case with tons of fans Intel .inf install 9.1.1.1025 Intel Rapid Store Driver 9.5.0.1037 NVIDIA 196.75 Soundmax Driver 6.10.2.6585 Realtek NIC driver 7.12.1218.2009 Realtek NIC Teaming Driver 6.8.1024.2008 Thanks. Z |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
Since you also get the error when you do not use the GPU, it's probably your CPU that's the problem. Any of the projects you mention will stress the CPU to the limit, so any (minor) damage it has will turn into a problem. Can you please test with Prime95 if it reacts the same? If it does, it's your CPU. You can also do a test of your memory, with Memtest86+ to see if that's the culprit. |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
Will try Prime 95, have run extensive memory tests without issue already. Also, I have the client set to use no more than 75% of the processor time so I am not peaking it out, but will try the Prime95 and see what happens. Z |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
Well, that was quick. Prime95 caused the exact same error in less than 1 minute. Am contacting Intel now for RMA on CPU. Thanks for your help, I appreciate it. Z |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
You're welcome, and I'm sorry it is the CPU. But at least that's what BOINC is used for as well, stress testing. If there are shortcomings in the CPU then BOINC will find them. |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
Ok, spent the better part of the afternoon on the phone with Intel Tech Support. They claim Prime95's failure in no way indicates a failed CPU and refuse to RMA it on just those grounds. Any other ideas? Thank you. Z |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
What do they claim that's causing it then, cosmic rays? You did tell them that BOINC projects also cause this BSOD? What else should you test it with then, according to them? Anyway, here's another CPU stress testing kit: http://downloads.guru3d.com/IntelBurnTest-v1.6-download-2047.html See what that one does. |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
Thank you. I will download and test with this also. Not sure what they wanted besides a "qualified shop" to have tested and drawn the same conclusion. If I get the same results with this package I will recontact them and be more insistent on resolution. Thank you again for the help with this. Z |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
I discussed this with one of the BOINC developers. He says it's very unlikely, but worth checking anyway, that it may be a rootkit. Have you thoroughly scanned your system with a good AV package, after you booted off a CD? |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
Hi Jord, Thank you for all of the advice. I am running Kaspersky IS 2010 and it is in date and up to date. No nasties detected. For grins, I borrowed my buddy's processor, same exact model, from his system to test. We have identical setups except for the fact that he runs CrossfireX and I run SLI, and he runs an LSI SATA III RAID card in RAID0 with three 6GB/s drives and I run MB based Intel RAID in RAID 10 with 4 3GB/s drives. Exact same BSOD with his proc in my box, and no BSOD when we run BOINC on his box. On a side note, after the second test software and repeating the BSOD Intel agreed to RMA the processor, but it looks as if it was unnecessary. Still trying to figure it out. Thanks again for all the help. Z |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
Ah... that moves the problem from CPU to motherboard. I'd forgotten about the motherboard, to be honest. Unless you found a production flaw in that specific model of the Intel CPU, but that's even more unlikely than it being a rootkit. ;-) |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
Time to contact the next vendor. Thanks for your patience and direction. I am looking forward to nailing this down. Z |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
So, before I unpacked the entire box to work on RMA the motherboard, I fiddled with the voltage settings. To this point everything has been running at stock. I found if I incease the CPU and QPI voltages I can get Prime95 to finish without a BSOD, but the system will BSOD some random time after that with a different hardware error. So I am leaning towards ASUS running things on the lean side power wise for that much processing. I am going to fiddle with this a bit more before I take it all apart. Z |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
Just out of curiosity, what did the other BSOD say? |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
I'll run it today and get the error. |
Send message Joined: 14 Dec 09 Posts: 12 ![]() |
Cannot get it to throw the same error again. The issue is directly related to the XMP profile in the memory. If I turn off XMP in the BIOS, no issues. If I adjust the memory settings to what the manufacturer claims works, BSOD. Have run Memtest86 and MS Memory Tester and taken the sticks to a local shop to be tested on memory testing machine they have and they all check out. Guess it is stock, wimpy memory timings and speed for the time being. Thank you for all of the help with this. Z |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
Guess it is stock, wimpy memory timings and speed for the time being. Which is why I don't mind throwing an extra buck at it and get something more expensive from Kingston, Crucial or A-Data. But the previous error should still be stored in your Windows Event Log (Start->Control Panel->Administrative Tools->Event Log), either system or application errors. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.