Plantage

Top Page

Reply to this message
Author: Frédéric
Date:  
To: guilde
New-Topics: Extraordinaire !! (Re: Plantage)
Subject: Plantage
Bonjour,

Ce week-end, une machine du boulot a vautré bien comme il faut, et j'aimerais
comprendre pourquoi. On a rebooté la machine le matin, à 10h37...

Dans /var/log/message, on trouve :

Sep 1 21:56:52 in22 -- MARK --
Sep 1 22:16:53 in22 -- MARK --
Sep 1 22:36:53 in22 -- MARK --
Sep 1 22:56:53 in22 -- MARK --
Sep 1 23:16:53 in22 -- MARK --
Sep 1 23:36:53 in22 -- MARK --
Sep 1 23:56:53 in22 -- MARK --
Sep 2 00:16:54 in22 -- MARK --
Sep 2 00:36:54 in22 -- MARK --
Sep 2 00:56:54 in22 -- MARK --
Sep 2 01:06:01 in22 kernel: md: syncing RAID array md0
Sep 2 01:06:01 in22 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Sep 2 01:06:01 in22 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Sep 2 01:06:01 in22 kernel: md: using 128k window, over a total of 1003904 blocks.
Sep 2 01:06:01 in22 kernel: md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 in22 kernel: md: delaying resync of md2 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 in22 kernel: md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 in22 kernel: md: delaying resync of md3 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 in22 kernel: md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 in22 kernel: md: delaying resync of md2 until md3 has finished resync (they share one or more physical units)
Sep 2 01:06:39 in22 kernel: md: md0: sync done.
Sep 2 01:06:39 in22 kernel: md: delaying resync of md2 until md3 has finished resync (they share one or more physical units)
Sep 2 01:06:39 in22 kernel: md: syncing RAID array md1
Sep 2 01:06:39 in22 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Sep 2 01:06:39 in22 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Sep 2 01:06:39 in22 kernel: md: using 128k window, over a total of 7004224 blocks.
Sep 2 01:06:39 in22 kernel: md: delaying resync of md3 until md1 has finished resync (they share one or more physical units)
Sep 2 01:06:39 in22 kernel: RAID1 conf printout:
Sep 2 01:06:39 in22 kernel: --- wd:2 rd:2
Sep 2 01:06:39 in22 kernel: disk 0, wo:0, o:1, dev:hda1
Sep 2 01:06:39 in22 kernel: disk 1, wo:0, o:1, dev:hdb1
(reboot)
Sep 2 10:37:05 in22 syslogd 1.4.1#18: restart.
Sep 2 10:37:05 in22 kernel: klogd 1.4.1#18, log source = /proc/kmsg started.
Sep 2 10:37:05 in22 kernel: Linux version 2.6.17-2-686 (Debian 2.6.17-9) (waldi@???) (gcc version 4.1.2 20060901 (prerelease) (Deb
ian 4.1.1-13)) #1 SMP Wed Sep 13 16:34:10 UTC 2006
Sep 2 10:37:05 in22 kernel: BIOS-provided physical RAM map:
[...]

Lors du boot, on lit :

Sep 2 10:39:36 in22 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Sep 2 10:39:40 in22 kernel: hda: dma_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8217661
Sep 2 10:39:45 in22 kernel: ide: failed opcode was: unknown
Sep 2 10:39:49 in22 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Sep 2 10:39:53 in22 kernel: hda: dma_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8217661
Sep 2 10:39:57 in22 kernel: ide: failed opcode was: unknown
Sep 2 10:40:02 in22 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Sep 2 10:40:06 in22 kernel: hda: dma_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8217661
Sep 2 10:40:10 in22 kernel: ide: failed opcode was: unknown
Sep 2 10:40:14 in22 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Sep 2 10:40:19 in22 kernel: hda: dma_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8217661
Sep 2 10:40:23 in22 kernel: ide: failed opcode was: unknown
Sep 2 10:40:27 in22 kernel: hda: DMA disabled
Sep 2 10:40:32 in22 kernel: hdb: DMA disabled
Sep 2 10:40:36 in22 kernel: ide0: reset: success
Sep 2 10:40:40 in22 kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Sep 2 10:40:44 in22 kernel: hda: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8218274
Sep 2 10:40:48 in22 kernel: ide: failed opcode was: unknown
Sep 2 10:40:53 in22 kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Sep 2 10:41:21 in22 kernel: hda: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8218274
Sep 2 10:41:21 in22 kernel: ide: failed opcode was: unknown
Sep 2 10:41:22 in22 kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Sep 2 10:41:22 in22 kernel: hda: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8218274
Sep 2 10:41:22 in22 kernel: ide: failed opcode was: unknown
Sep 2 10:41:22 in22 kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Sep 2 10:41:22 in22 kernel: hda: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8218274
Sep 2 10:41:22 in22 kernel: ide: failed opcode was: unknown
Sep 2 10:41:22 in22 kernel: ide0: reset: success
Sep 2 10:41:22 in22 kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Sep 2 10:41:22 in22 kernel: hda: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8218274
Sep 2 10:41:22 in22 kernel: ide: failed opcode was: unknown
Sep 2 10:41:22 in22 kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Sep 2 10:41:22 in22 kernel: hda: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8218274
Sep 2 10:41:22 in22 kernel: ide: failed opcode was: unknown
Sep 2 10:41:22 in22 kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Sep 2 10:41:22 in22 kernel: hda: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=8218274, high=0, low=8218274, sector=8218274
[...]

Ces messages sortent tous les 1/4 d'heure, et puis ça s'arrête au bout d'un
moment :

Sep 2 11:23:50 in22 kernel: ide: failed opcode was: unknown
Sep 2 11:23:50 in22 kernel: end_request: I/O error, dev hda, sector 15939817
Sep 2 11:23:56 in22 kernel: hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Sep 2 11:23:56 in22 kernel: hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=15939817, high=0, low=15939817, sector=15939817
Sep 2 11:23:56 in22 kernel: ide: failed opcode was: unknown
Sep 2 11:23:56 in22 kernel: end_request: I/O error, dev hda, sector 15939817
Sep 2 11:24:18 in22 kernel: md: md1: sync done.
Sep 2 11:24:19 in22 kernel: RAID1 conf printout:
Sep 2 11:24:19 in22 kernel: --- wd:2 rd:2
Sep 2 11:24:19 in22 kernel: disk 0, wo:0, o:1, dev:hda2
Sep 2 11:24:19 in22 kernel: disk 1, wo:0, o:1, dev:hdb2

Il semble que le raid se soit resynchronisé vers 1h06 du mat, et qu'il n'ait
réussi qu'à vautrer la machine ! Esnuite, pourquoi ces messages lors du boot,
qui s'arrêtent une fois le raid synchronisé ? Et depuis, rien, tout est nickel.

Votre avis ?

PS : je ne peux pas contrôler les disques avec smartmontools, car eux font
planter systématiquement la machine (incompatible avec le raid ; j'avais mis
des semaines à trouver ce problème de plantage systématique toutes les
nuits !).

--
Frédéric

http://www.gbiloba.org