Please Help Me Interpret a Memory.DMP

Discussion in 'Techforge' started by Scott Hamilton Robert E Ron Paul Lee, Dec 5, 2010.

  1. Scott Hamilton Robert E Ron Paul Lee

    Scott Hamilton Robert E Ron Paul Lee Straight Awesome

    Joined:
    Jan 5, 2008
    Messages:
    29,016
    Location:
    TN
    Ratings:
    +14,152
    My server is rebooting unexpectedly. I have thought several times that I've had it under control, only to be surprised by it doing exactly what I thought I've fixed. I've updated every driver on my system to latest revisions. I've updated BIOS and firmware, firmware for the raid card, the backplane, etc (and I think I have done some good here, honestly). But, as I was sitting here tonight, pleased with myself for fixing things and learning... reboot.

    Code:
    Microsoft (R) Windows Debugger Version 6.11.0001.404 X86
    Copyright (c) Microsoft Corporation. All rights reserved.
     
    Loading Dump File [C:\WINDOWS\MEMORY.DMP]
    Kernel Summary Dump File: Only kernel address space is available
    Symbol search path is: C:\WINDOWS\Symbols
    Executable search path is: 
    Missing image name, possible paged-out or corrupt data.
    *** WARNING: Unable to verify timestamp for Unknown_Module_52335000
    *** ERROR: Module load completed but symbols could not be loaded for Unknown_Module_52335000
    Debugger can not determine kernel base address
    Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible
    Product: LanManNt, suite: SmallBusiness TerminalServer SmallBusinessRestricted SingleUserTS
    Built by: 3790.srv03_sp2_gdr.100216-1301
    Machine Name:
    Kernel base = 0x80800000 PsLoadedModuleList = 0x808a6ea8
    Debug session time: Sat Dec 4 06:42:22.602 2010 (GMT-6)
    System Uptime: 0 days 0:12:36.216
    Page 146d35 too large to be in the dump file.
    Page 1215ea5 too large to be in the dump file.
    Page 1cbea3d too large to be in the dump file.
    Missing image name, possible paged-out or corrupt data.
    *** WARNING: Unable to verify timestamp for Unknown_Module_52335000
    *** ERROR: Module load completed but symbols could not be loaded for Unknown_Module_52335000
    Debugger can not determine kernel base address
    Loading Kernel Symbols
    Missing image name, possible paged-out or corrupt data.
    .Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************
    Use !analyze -v to get detailed debugging information.
    BugCheck 7F, {8, f7727fe0, 0, 0}
    ***** Debugger could not find nt in module list, module list might be corrupt, error 0x80070057.
    Missing image name, possible paged-out or corrupt data.
    Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    Page 146d35 too large to be in the dump file.
    GetContextState failed, 0x80004002
    Unable to read selector for PCR for processor 1
    Page 1215ea5 too large to be in the dump file.
    GetContextState failed, 0x80004002
    Unable to read selector for PCR for processor 2
    Page 1cbea3d too large to be in the dump file.
    GetContextState failed, 0x80004002
    Unable to read selector for PCR for processor 3
    Missing image name, possible paged-out or corrupt data.
    Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    Missing image name, possible paged-out or corrupt data.
    Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    Missing image name, possible paged-out or corrupt data.
    Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    Missing image name, possible paged-out or corrupt data.
    Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    Probably caused by : Unknown_Image ( ANALYSIS_INCONCLUSIVE )
    Followup: MachineOwner
    ---------
    0: kd> !analyze -v
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************
    UNEXPECTED_KERNEL_MODE_TRAP (7f)
    This means a trap occurred in kernel mode, and it's a trap of a kind
    that the kernel isn't allowed to have/catch (bound trap) or that
    is always instant death (double fault). The first number in the
    bugcheck params is the number of the trap (8 = double fault, etc)
    Consult an Intel x86 family manual to learn more about what these
    traps are. Here is a *portion* of those codes:
    If kv shows a taskGate
    use .tss on the part before the colon, then kv.
    Else if kv shows a trapframe
    use .trap on that value
    Else
    .trap on the appropriate frame will show where the trap was taken
    (on x86, this will be the ebp that goes with the procedure KiTrap)
    Endif
    kb will then show the corrected stack.
    Arguments:
    Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
    Arg2: f7727fe0
    Arg3: 00000000
    Arg4: 00000000
    Debugging Details:
    ------------------
    ***** Debugger could not find nt in module list, module list might be corrupt, error 0x80070057.
    Missing image name, possible paged-out or corrupt data.
    Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    Page 146d35 too large to be in the dump file.
    GetContextState failed, 0x80004002
    Unable to read selector for PCR for processor 1
    Page 1215ea5 too large to be in the dump file.
    GetContextState failed, 0x80004002
    Unable to read selector for PCR for processor 2
    Page 1cbea3d too large to be in the dump file.
    GetContextState failed, 0x80004002
    Unable to read selector for PCR for processor 3
    Missing image name, possible paged-out or corrupt data.
    Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    Missing image name, possible paged-out or corrupt data.
    Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    Missing image name, possible paged-out or corrupt data.
    Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    Missing image name, possible paged-out or corrupt data.
    Unable to read KLDR_DATA_TABLE_ENTRY at 00000000 - NTSTATUS 0xC0000147
    WARNING: .reload failed, module list may be incomplete
    BUGCHECK_STR: 0x7f_8
    DEFAULT_BUCKET_ID: DRIVER_FAULT
    CURRENT_IRQL: 0
    LAST_CONTROL_TRANSFER: from ffdffee0 to ba90cca2
    STACK_TEXT: 
    WARNING: Frame IP not in any known module. Following frames may be wrong.
    8089a5e0 ffdffee0 8086f007 ffdff000 88d7e9a0 0xba90cca2
    8089a600 8088de52 00000000 0000000e 00000000 0xffdffee0
    8089db40 00000000 8089db48 8089db48 8089db50 0x8088de52
     
    STACK_COMMAND: kb
    SYMBOL_NAME: ANALYSIS_INCONCLUSIVE
    FOLLOWUP_NAME: MachineOwner
    MODULE_NAME: Unknown_Module
    IMAGE_NAME: Unknown_Image
    DEBUG_FLR_IMAGE_TIMESTAMP: 0
    BUCKET_ID: CORRUPT_MODULELIST
    Followup: MachineOwner
    ---------
    
    I've run 3 memory tests from hiren's disk. I'm downloading ultimate boot disk... but if anyone here can help point me towards a solution, I would appreciate it.

    I'm prepared to swap out the mother board and memory if I can't figure anything out.
  2. Scott Hamilton Robert E Ron Paul Lee

    Scott Hamilton Robert E Ron Paul Lee Straight Awesome

    Joined:
    Jan 5, 2008
    Messages:
    29,016
    Location:
    TN
    Ratings:
    +14,152
    Could a file system error - source NtFrs event 13568 cause a double fault in any way? Basically it is a replica error. I've been treating this as a secondary problem, but for some reason that just popped in my head.
  3. skinofevil

    skinofevil Fresh Meat

    Joined:
    Oct 23, 2009
    Messages:
    12,880
    Location:
    91367
    Ratings:
    +3,684
    Looks like a really, really long-winded version of:

    Get your :chris: hat and magnifying glass, and welcome to Microsoft's idea of productivity.
  4. Caboose

    Caboose ....

    Joined:
    Mar 29, 2004
    Messages:
    17,782
    Location:
    Mission Control
    Ratings:
    +9,489
    Dude, I concluded long ago that anything related to programming issues will fall upon deaf ears since the ones who could answer/assist have spent numerous hours and thousands of dollars to get to that pearch they rest upon.

    Likely if you look up all you'll see is a toothy grin.

    Just sayin'. :lol:
  5. Muad Dib

    Muad Dib Probably a Dual Deceased Member

    Joined:
    May 4, 2004
    Messages:
    53,665
    Ratings:
    +23,779
    It sounds like you have a technical problem and require professional assistance.
    • Agree Agree x 1
  6. Kyle

    Kyle You will regret this!

    Joined:
    Mar 29, 2004
    Messages:
    9,150
    Location:
    California?!?!
    Ratings:
    +2,814
    As if Mac OS X/Unix/Linux kernel panics are any better. When serious shit goes down on your computer, the resulting information about it is going to be serious. If it wasn't something complicated, off-the-wall, and awful, don't you think they would have caught the issue?
  7. Order2Chaos

    Order2Chaos Ultimate... Immortal Administrator

    Joined:
    Apr 2, 2004
    Messages:
    25,222
    Location:
    here there be dragons
    Ratings:
    +21,472
    Well, the panic.log file usually quite handily points out the offending kernel extension or library, and occasionally points you in the right direction in hardware failure cases.
  8. skinofevil

    skinofevil Fresh Meat

    Joined:
    Oct 23, 2009
    Messages:
    12,880
    Location:
    91367
    Ratings:
    +3,684
    Yup. At least with Linux/OS X, the OS has some idea of why it just shit itself. Windows is the dumb blonde of OSes.
  9. Scott Hamilton Robert E Ron Paul Lee

    Scott Hamilton Robert E Ron Paul Lee Straight Awesome

    Joined:
    Jan 5, 2008
    Messages:
    29,016
    Location:
    TN
    Ratings:
    +14,152
    Actually, I've learned quite a bit about debugging in the last 48 hours, and I've found a great program called "BlueScreenView". I'm seeing an error that is pointing to a driver for one of the Server's NIC's, and I'm gonna take a two pronged approach:
    1. Update said driver.
    2. Uncheck powersaving for said driver - I believe the resume portion of the driver is the culprit.
  10. Bickendan

    Bickendan Custom Title Administrator Faceless Mook Writer

    Joined:
    May 7, 2010
    Messages:
    24,047
    Ratings:
    +28,732
    You mean, this :bigass: toothy grin?
    :?:

    I had a BSOD last night, but the laptop reset itself before I could see just what it was whining itself. Triggered, by all things, by moving the laptop and setting it back down :jayzus:
  11. Scott Hamilton Robert E Ron Paul Lee

    Scott Hamilton Robert E Ron Paul Lee Straight Awesome

    Joined:
    Jan 5, 2008
    Messages:
    29,016
    Location:
    TN
    Ratings:
    +14,152
    That's almost certainly a hardware fault. I would estimate that you are dealing with a system that has gotten very, very hot and has some componants that are bordering on being "unballed." I steal that term from the xBox 360.

    Mine got the E74 error (1 red light, not 3) several weeks ago. There are a few ghetto fixes, but basically what causes the problem is that the chip gets so hot that the soldering between the chip and the board becomes undone.

    The fix was to get a heat gun and nuke that chip, get it hot enough, and let it "reball" (again, a 360 term I didn't coin).

    The reason your problem reminds me of it is that there are a few ghetto fixes for this issue:
    1. Stand on the 360 while it's hot (some people claim it works).
    2. Putt a bunch of pennies (I forget how many) together with some type of thermal glue, so that when the case is reassembled, it puts pressure on the chip/mobo surface area, negating the unreliable solder.

    It isn't that hard for laptops to overheat, as a lot of people cover the fan openings on them without realizing it.
  12. Scott Hamilton Robert E Ron Paul Lee

    Scott Hamilton Robert E Ron Paul Lee Straight Awesome

    Joined:
    Jan 5, 2008
    Messages:
    29,016
    Location:
    TN
    Ratings:
    +14,152
    Oh, and even though I'm pretty sure this has something to do with the intel raid card, I'm waiting on Microsoft to call me back. There is a small tech company I sometimes make emergency calls for in Memphis if all of their other people are tied up/vacationing and I was running this by them today.

    They stated that since I've been their employee, I could use their microsoft partner account. Nice.