FesselEnte
09/02/2009, 11h45
Hello, Just a little note about a problem I recently had with a DDR2-800 DIMM which passed all the memtest86+ tests, including the bit fade tests (and I ran them several nights), but which finally was shown to have problems. <_<
Indeed, (under Fedora 10), large files (gigabyte-sized) containing only zeros, written to disk with "dd if=/dev/zero" , then re-read, consistently had bit-flips at random locations at 32-byte intervals (where exactly changed in-between reads). For example, one might get the following file contents instead of the expected zeros-only (but only for large files, on the order of the system's memory size):
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
16fcf40 0000 0000 0000 0000 0020 0000 0000 0000
16fcf50 0000 0000 0000 0000 0000 0000 0000 0000
16fcf60 0000 0000 0000 0000 0024 0000 0000 0000
16fcf70 0000 0000 0000 0000 0000 0000 0000 0000
16fcf80 0000 0000 0000 0000 0024 0000 0000 0000
16fcf90 0000 0000 0000 0000 0000 0000 0000 0000
16fcfa0 0000 0000 0000 0000 0024 0000 0000 0000
16fcfb0 0000 0000 0000 0000 0000 0000 0000 0000
16fcfc0 0000 0000 0000 0000 0020 0000 0000 0000
16fcfd0 0000 0000 0000 0000 0000 0000 0000 0000
16fcfe0 0000 0000 0000 0000 0031 0000 0000 0000
16fcff0 0000 0000 0000 0000 0000 0000 0000 0000
*
4df0a660 0000 0000 0000 0000 0004 0000 0000 0000
4df0a670 0000 0000 0000 0000 0000 0000 0000 0000
4df0a680 0000 0000 0000 0000 0020 0000 0000 0000
4df0a690 0000 0000 0000 0000 0000 0000 0000 0000
4df0a6a0 0000 0000 0000 0000 0004 0000 0000 0000
4df0a6b0 0000 0000 0000 0000 0000 0000 0000 0000
*
4df0a6e0 0000 0000 0000 0000 0021 0000 0000 0000
4df0a6f0 0000 0000 0000 0000 0000 0000 0000 0000
4df0a700 0000 0000 0000 0000 0024 0000 0000 0000
4df0a710 0000 0000 0000 0000 0000 0000 0000 0000
4df0a720 0000 0000 0000 0000 0020 0000 0000 0000
4df0a730 0000 0000 0000 0000 0000 0000 0000 0000
4df0a740 0000 0000 0000 0000 0020 0000 0000 0000
4df0a750 0000 0000 0000 0000 0000 0000 0000 0000
4df0a760 0000 0000 0000 0000 0030 0000 0000 0000
4df0a770 0000 0000 0000 0000 0000 0000 0000 0000
4df0a780 0000 0000 0000 0000 0024 0000 0000 0000
4df0a790 0000 0000 0000 0000 0000 0000 0000 0000
*
4df0a7c0 0000 0000 0000 0000 0026 0000 0000 0000
4df0a7d0 0000 0000 0000 0000 0000 0000 0000 0000
4df0a7e0 0000 0000 0000 0000 0037 0000 0000 0000
4df0a7f0 0000 0000 0000 0000 0000 0000 0000 0000
4df0a800 0000 0000 0000 0000 0027 0000 0000 0000
4df0a810 0000 0000 0000 0000 0000 0000 0000 0000
4df0a820 0000 0000 0000 0000 0038 0000 0000 0000
4df0a830 0000 0000 0000 0000 0000 0000 0000 0000
*
73f78000
Replacing one of the DIMMs fixed this. I have now made a point of putting ECC RAM into that system ... and all my future ones.
So, how often does it happen that memtest86+ finds nothing but the memory is busted anyway? Or why would this problem not be caught by the memtest86+ test suite?
For reference purposes, here is the test script that generated the large files, then re-read them and checked for zeros:
#!/bin/bash
BLOCKS=4200000
ERROR=0
SOURCE=/dev/zero
OUTFILE=entropy
BLOCKSSTEP=10000
# Write initial file full of zeros
echo "Writing initial file of $BLOCKS blocks"
dd if=$SOURCE of=$OUTFILE count=$BLOCKS conv=fdatasync
while [[ $ERROR == 0 ]]; do
echo -n "Testing $BLOCKS blocks at "
date
# hexdump the file full of zeros
HEXDUMP=hexdump_`date +%Y%m%d_%H%M%S`
hexdump $OUTFILE > $HEXDUMP
# if the hexdump contains only zeros, then...
LINE=`cat $HEXDUMP | cut --fields=1- --delimiter=" " --only-delimited`
if [[ $LINE != "0000000 0000 0000 0000 0000 0000 0000 0000 0000" ]]; then
ERROR=1
echo "Errors found in $HEXDUMP ... dumping a second time"
hexdump $OUTFILE > ${HEXDUMP}_repeat
else
let BLOCKS=$BLOCKS+$BLOCKSSTEP
dd if=$SOURCE of=$OUTFILE \
count=$BLOCKSSTEP \
conv=fdatasync,notrunc \
status=noxfer \
oflag=append
fi
done
done
Indeed, (under Fedora 10), large files (gigabyte-sized) containing only zeros, written to disk with "dd if=/dev/zero" , then re-read, consistently had bit-flips at random locations at 32-byte intervals (where exactly changed in-between reads). For example, one might get the following file contents instead of the expected zeros-only (but only for large files, on the order of the system's memory size):
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
16fcf40 0000 0000 0000 0000 0020 0000 0000 0000
16fcf50 0000 0000 0000 0000 0000 0000 0000 0000
16fcf60 0000 0000 0000 0000 0024 0000 0000 0000
16fcf70 0000 0000 0000 0000 0000 0000 0000 0000
16fcf80 0000 0000 0000 0000 0024 0000 0000 0000
16fcf90 0000 0000 0000 0000 0000 0000 0000 0000
16fcfa0 0000 0000 0000 0000 0024 0000 0000 0000
16fcfb0 0000 0000 0000 0000 0000 0000 0000 0000
16fcfc0 0000 0000 0000 0000 0020 0000 0000 0000
16fcfd0 0000 0000 0000 0000 0000 0000 0000 0000
16fcfe0 0000 0000 0000 0000 0031 0000 0000 0000
16fcff0 0000 0000 0000 0000 0000 0000 0000 0000
*
4df0a660 0000 0000 0000 0000 0004 0000 0000 0000
4df0a670 0000 0000 0000 0000 0000 0000 0000 0000
4df0a680 0000 0000 0000 0000 0020 0000 0000 0000
4df0a690 0000 0000 0000 0000 0000 0000 0000 0000
4df0a6a0 0000 0000 0000 0000 0004 0000 0000 0000
4df0a6b0 0000 0000 0000 0000 0000 0000 0000 0000
*
4df0a6e0 0000 0000 0000 0000 0021 0000 0000 0000
4df0a6f0 0000 0000 0000 0000 0000 0000 0000 0000
4df0a700 0000 0000 0000 0000 0024 0000 0000 0000
4df0a710 0000 0000 0000 0000 0000 0000 0000 0000
4df0a720 0000 0000 0000 0000 0020 0000 0000 0000
4df0a730 0000 0000 0000 0000 0000 0000 0000 0000
4df0a740 0000 0000 0000 0000 0020 0000 0000 0000
4df0a750 0000 0000 0000 0000 0000 0000 0000 0000
4df0a760 0000 0000 0000 0000 0030 0000 0000 0000
4df0a770 0000 0000 0000 0000 0000 0000 0000 0000
4df0a780 0000 0000 0000 0000 0024 0000 0000 0000
4df0a790 0000 0000 0000 0000 0000 0000 0000 0000
*
4df0a7c0 0000 0000 0000 0000 0026 0000 0000 0000
4df0a7d0 0000 0000 0000 0000 0000 0000 0000 0000
4df0a7e0 0000 0000 0000 0000 0037 0000 0000 0000
4df0a7f0 0000 0000 0000 0000 0000 0000 0000 0000
4df0a800 0000 0000 0000 0000 0027 0000 0000 0000
4df0a810 0000 0000 0000 0000 0000 0000 0000 0000
4df0a820 0000 0000 0000 0000 0038 0000 0000 0000
4df0a830 0000 0000 0000 0000 0000 0000 0000 0000
*
73f78000
Replacing one of the DIMMs fixed this. I have now made a point of putting ECC RAM into that system ... and all my future ones.
So, how often does it happen that memtest86+ finds nothing but the memory is busted anyway? Or why would this problem not be caught by the memtest86+ test suite?
For reference purposes, here is the test script that generated the large files, then re-read them and checked for zeros:
#!/bin/bash
BLOCKS=4200000
ERROR=0
SOURCE=/dev/zero
OUTFILE=entropy
BLOCKSSTEP=10000
# Write initial file full of zeros
echo "Writing initial file of $BLOCKS blocks"
dd if=$SOURCE of=$OUTFILE count=$BLOCKS conv=fdatasync
while [[ $ERROR == 0 ]]; do
echo -n "Testing $BLOCKS blocks at "
date
# hexdump the file full of zeros
HEXDUMP=hexdump_`date +%Y%m%d_%H%M%S`
hexdump $OUTFILE > $HEXDUMP
# if the hexdump contains only zeros, then...
LINE=`cat $HEXDUMP | cut --fields=1- --delimiter=" " --only-delimited`
if [[ $LINE != "0000000 0000 0000 0000 0000 0000 0000 0000 0000" ]]; then
ERROR=1
echo "Errors found in $HEXDUMP ... dumping a second time"
hexdump $OUTFILE > ${HEXDUMP}_repeat
else
let BLOCKS=$BLOCKS+$BLOCKSSTEP
dd if=$SOURCE of=$OUTFILE \
count=$BLOCKSSTEP \
conv=fdatasync,notrunc \
status=noxfer \
oflag=append
fi
done
done