Réseau CPC BIENDEBUTER.NET Crunchez vos adresses URL
|
Calculez la conso électrique de votre PC
|
Hébergez vos photos
+ Reply to Thread
Results 1 to 2 of 2
  1. #1
    Hi there,

    I had an issue on a DELL R610 Server with 4x4Gb of ECC RAM.
    This machine is running under debian linux and was filling /var/log/messages with :

    Apr 25 18:24:37 imu253 kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 3.
    Apr 25 18:24:37 imu253 kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 5.
    Apr 25 18:24:37 imu253 kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 6.
    Apr 25 18:24:37 imu253 kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 1.
    Apr 25 18:24:37 imu253 kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
    Apr 25 18:24:37 imu253 kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 4.
    Apr 25 18:24:37 imu253 kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 2.
    Apr 25 18:24:37 imu253 kernel: Bank 8: cc0001800001009f
    Apr 25 18:24:37 imu253 last message repeated 6 times

    every 20 seconds.

    and in the ipmi log, I had that kind of errors
    4 | 04/25/2012 | 20:17:04 | Memory #0x02 | Uncorrectable ECC | Asserted
    5 | 04/25/2012 | 20:24:33 | Memory #0x1b | Transition to Non-critical from OK
    6 | 04/25/2012 | 20:25:49 | Memory #0x1b | Transition to Critical from less severe
    7 | 04/25/2012 | 20:26:15 | Memory #0x02 | Uncorrectable ECC | Asserted

    I launched a full MEMTEST check with and without the ECC option turned on but had no errors.

    Finally I could isolate the faulty memory running a live stress CD and watching the logs for each RAM module.

    Don't know if you can do something for that but it was just to inform you.

    Cheers

  2. #2
    According to the logs, it seems that an uncorrectable ECC error (caused by a faulty memory module) is still finally corrected in hardware by a higher lever of redundancy (i.e. Dell Memory RAID). As long as the error doesn't have an impact on read/write memory operations, Memtest86+ will not detect it and software will not be affected.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts