Linux zerstört ext4 - Hardwaredefekt?

cuco

Moderator
Teammitglied
Themenstarter
Registriert
1 Dez. 2011
Beiträge
7.865
Ich war mir nicht ganz sicher, ob das jetzt eher in "Linux" gehört oder in die "Hardware-Ecke".

Und zwar habe ich gestern Datenrettung auf dem Server eines Freundes durchgeführt. Das ext4-Dateisystem hatte sich ziemlich zerrockt, eine ganze Menge (zum Glück unwichtiger) Dateien waren nicht mehr lesbar. Da die SMART-Werte der HDD schlecht aussahen (Raw_Read_Error_Rate ~2000, Current_Pending_Sector ~40, UDMA_CRC_Error_Count ~90), haben wir das darauf geschoben.

Also neue (gebrauchte) HDD rein, deren SMART-Werte ok sind, alle Daten mit rsync rüberkopiert, GRUB2 repariert und die Mounts in der fstab angepasst, dann bootete das System auch wieder.

Nur: Unter Last, insbesondere wenn mehrere Prozesse gleichzeitig IOs auf die Platte verursachen, wird das System extrem träge, die Load geht auf 8-15, bis die Prozesse teilweise abbrechen, weil sync() fehlgeschlagen ist.

Der Syslog wirft wieder die gleichen Fehler aus wie vor dem HDD-Tausch.
Hier ein paar Beispiele aus dem Log:
Code:
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764065] ata7.00: exception Emask 0x0 SAct 0x1c00 SErr 0x0 action 0x6 frozenJun 30 05:21:07 Bumblebee kernel: [ 4699.764200] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764284] ata7.00: cmd 61/48:50:98:c1:8e/00:00:15:00:00/40 tag 10 ncq 36864 out
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764284]          res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764502] ata7.00: status: { DRDY }
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764551] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764632] ata7.00: cmd 61/18:58:60:d6:04/01:00:1d:00:00/40 tag 11 ncq 143360 out
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764632]          res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764852] ata7.00: status: { DRDY }
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764901] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764982] ata7.00: cmd 61/08:60:e0:db:45/00:00:05:00:00/40 tag 12 ncq 4096 out
Jun 30 05:21:07 Bumblebee kernel: [ 4699.764982]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:21:07 Bumblebee kernel: [ 4699.765196] ata7.00: status: { DRDY }
Jun 30 05:21:07 Bumblebee kernel: [ 4699.765248] ata7: hard resetting link
Jun 30 05:21:09 Bumblebee kernel: [ 4702.216050] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 30 05:21:09 Bumblebee kernel: [ 4702.253312] ata7.00: configured for UDMA/100
Jun 30 05:21:09 Bumblebee kernel: [ 4702.268044] ata7.00: device reported invalid CHS sector 0
Jun 30 05:21:09 Bumblebee kernel: [ 4702.268056] ata7: EH complete

Code:
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361341] sd 6:0:0:0: [sda] Unhandled error code
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361349] sd 6:0:0:0: [sda]
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361351] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361353] sd 6:0:0:0: [sda] CDB:
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361356] Write(10): 2a 00 26 a8 8f f8 00 00 08 00
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361364] end_request: I/O error, dev sda, sector 648581112
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361457] EXT4-fs warning (device sda1): ext4_end_bio:317: I/O error -5 writing to inode 12845860 (offset 42987520 size 4096 starting block 81072640)
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361461] Buffer I/O error on device sda1, logical block 81072383
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361584] sd 6:0:0:0: [sda] Unhandled error code
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361587] sd 6:0:0:0: [sda]
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361588] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361590] sd 6:0:0:0: [sda] CDB:
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361591] Write(10): 2a 00 26 a9 9c 00 00 00 d8 00
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361596] end_request: I/O error, dev sda, sector 648649728
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361676] EXT4-fs warning (device sda1): ext4_end_bio:317: I/O error -5 writing to inode 12857718 (offset 0 size 110592 starting block 81081243)
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361679] Buffer I/O error on device sda1, logical block 81080960
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361779] Buffer I/O error on device sda1, logical block 81080961
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361879] Buffer I/O error on device sda1, logical block 81080962
Jun 30 05:22:44 Bumblebee kernel: [ 4796.361995] Buffer I/O error on device sda1, logical block 81080963
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362095] Buffer I/O error on device sda1, logical block 81080964
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362194] Buffer I/O error on device sda1, logical block 81080965
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362293] Buffer I/O error on device sda1, logical block 81080966
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362393] Buffer I/O error on device sda1, logical block 81080967
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362492] Buffer I/O error on device sda1, logical block 81080968
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362660] sd 6:0:0:0: [sda] Unhandled error code
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362663] sd 6:0:0:0: [sda]
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362664] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362666] sd 6:0:0:0: [sda] CDB:
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362667] Write(10): 2a 00 18 87 96 80 00 00 60 00
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362673] end_request: I/O error, dev sda, sector 411539072
Jun 30 05:22:44 Bumblebee kernel: [ 4796.362755] EXT4-fs warning (device sda1): ext4_end_bio:317: I/O error -5 writing to inode 12846636 (offset 0 size 49152 starting block 51442396)

Code:
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732073] ata7.00: exception Emask 0x0 SAct 0x7fe SErr 0x0 action 0x6 frozen
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732193] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732279] ata7.00: cmd 61/00:08:00:7c:aa/04:00:26:00:00/40 tag 1 ncq 524288 out
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732279]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732496] ata7.00: status: { DRDY }
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732545] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732625] ata7.00: cmd 61/00:10:00:80:aa/04:00:26:00:00/40 tag 2 ncq 524288 out
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732625]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732843] ata7.00: status: { DRDY }
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732891] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732972] ata7.00: cmd 61/00:18:00:84:aa/04:00:26:00:00/40 tag 3 ncq 524288 out
Jun 30 05:34:57 Bumblebee kernel: [ 5529.732972]          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:34:57 Bumblebee kernel: [ 5529.733194] ata7.00: status: { DRDY }
Jun 30 05:34:57 Bumblebee kernel: [ 5529.733247] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:34:57 Bumblebee kernel: [ 5529.733328] ata7.00: cmd 61/00:20:00:88:aa/04:00:26:00:00/40 tag 4 ncq 524288 out
Jun 30 05:34:57 Bumblebee kernel: [ 5529.733328]          res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:34:57 Bumblebee kernel: [ 5529.733544] ata7.00: status: { DRDY }
Jun 30 05:34:57 Bumblebee kernel: [ 5529.733593] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:34:57 Bumblebee kernel: [ 5529.733674] ata7.00: cmd 61/00:28:00:8c:aa/04:00:26:00:00/40 tag 5 ncq 524288 out
Jun 30 05:34:57 Bumblebee kernel: [ 5529.733674]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:34:57 Bumblebee kernel: [ 5529.743264] ata7.00: status: { DRDY }
Jun 30 05:34:57 Bumblebee kernel: [ 5529.748076] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:34:57 Bumblebee kernel: [ 5529.752770] ata7.00: cmd 61/00:30:00:90:aa/04:00:26:00:00/40 tag 6 ncq 524288 out
Jun 30 05:34:57 Bumblebee kernel: [ 5529.752770]          res 40/00:ff:ff:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:34:57 Bumblebee kernel: [ 5529.762108] ata7.00: status: { DRDY }
Jun 30 05:34:57 Bumblebee kernel: [ 5529.766771] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:34:57 Bumblebee kernel: [ 5529.771393] ata7.00: cmd 61/00:38:00:94:aa/04:00:26:00:00/40 tag 7 ncq 524288 out
Jun 30 05:34:57 Bumblebee kernel: [ 5529.771393]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:34:57 Bumblebee kernel: [ 5529.780790] ata7.00: status: { DRDY }
Jun 30 05:34:57 Bumblebee kernel: [ 5529.785471] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:34:57 Bumblebee kernel: [ 5529.790119] ata7.00: cmd 61/00:40:00:98:aa/04:00:26:00:00/40 tag 8 ncq 524288 out
Jun 30 05:34:57 Bumblebee kernel: [ 5529.790119]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:34:57 Bumblebee kernel: [ 5529.799582] ata7.00: status: { DRDY }
Jun 30 05:34:57 Bumblebee kernel: [ 5529.804314] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:34:57 Bumblebee kernel: [ 5529.809007] ata7.00: cmd 61/00:48:00:9c:aa/04:00:26:00:00/40 tag 9 ncq 524288 out
Jun 30 05:34:57 Bumblebee kernel: [ 5529.809007]          res 40/00:ff:ff:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:34:57 Bumblebee kernel: [ 5529.818478] ata7.00: status: { DRDY }
Jun 30 05:34:57 Bumblebee kernel: [ 5529.823214] ata7.00: failed command: WRITE FPDMA QUEUED
Jun 30 05:34:57 Bumblebee kernel: [ 5529.827958] ata7.00: cmd 61/00:50:00:a0:aa/04:00:26:00:00/40 tag 10 ncq 524288 out
Jun 30 05:34:57 Bumblebee kernel: [ 5529.827958]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 30 05:34:57 Bumblebee kernel: [ 5529.837808] ata7.00: status: { DRDY }
Jun 30 05:34:57 Bumblebee kernel: [ 5529.842859] ata7: hard resetting link
Jun 30 05:34:59 Bumblebee kernel: [ 5532.296057] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 30 05:35:00 Bumblebee kernel: [ 5532.335105] ata7.00: configured for UDMA/100
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348046] ata7.00: device reported invalid CHS sector 0
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348051] ata7.00: device reported invalid CHS sector 0
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348057] ata7.00: device reported invalid CHS sector 0
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348061] ata7.00: device reported invalid CHS sector 0
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348065] ata7.00: device reported invalid CHS sector 0
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348068] ata7.00: device reported invalid CHS sector 0

Code:
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348086] sd 6:0:0:0: [sda] Unhandled error code
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348088] sd 6:0:0:0: [sda]
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348091] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348093] sd 6:0:0:0: [sda] CDB:
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348095] Write(10): 2a 00 26 aa 28 00 00 04 00 00
Jun 30 05:35:00 Bumblebee kernel: [ 5532.348105] end_request: I/O error, dev sda, sector 648685568
Jun 30 05:35:00 Bumblebee kernel: [ 5532.353317] EXT4-fs warning (device sda1): ext4_end_bio:317: I/O error -5 writing to inode 12846129 (offset 16777216 size 8388608 starting block 81085824)
Jun 30 05:35:00 Bumblebee kernel: [ 5532.353324] buffer_io_error: 30 callbacks suppressed
Jun 30 05:35:00 Bumblebee kernel: [ 5532.353329] Buffer I/O error on device sda1, logical block 81085440
Jun 30 05:35:00 Bumblebee kernel: [ 5532.358623] Buffer I/O error on device sda1, logical block 81085441
Jun 30 05:35:00 Bumblebee kernel: [ 5532.363917] Buffer I/O error on device sda1, logical block 81085442
Jun 30 05:35:00 Bumblebee kernel: [ 5532.369236] Buffer I/O error on device sda1, logical block 81085443
Jun 30 05:35:00 Bumblebee kernel: [ 5532.374554] Buffer I/O error on device sda1, logical block 81085444
Jun 30 05:35:00 Bumblebee kernel: [ 5532.379778] Buffer I/O error on device sda1, logical block 81085445
Jun 30 05:35:00 Bumblebee kernel: [ 5532.384893] Buffer I/O error on device sda1, logical block 81085446
Jun 30 05:35:00 Bumblebee kernel: [ 5532.389910] Buffer I/O error on device sda1, logical block 81085447
Jun 30 05:35:00 Bumblebee kernel: [ 5532.394823] Buffer I/O error on device sda1, logical block 81085448
Jun 30 05:35:00 Bumblebee kernel: [ 5532.399641] Buffer I/O error on device sda1, logical block 81085449
Jun 30 05:35:00 Bumblebee kernel: [ 5532.404421] sd 6:0:0:0: [sda] Unhandled error code
Jun 30 05:35:00 Bumblebee kernel: [ 5532.404423] sd 6:0:0:0: [sda]
Jun 30 05:35:00 Bumblebee kernel: [ 5532.404424] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jun 30 05:35:00 Bumblebee kernel: [ 5532.404426] sd 6:0:0:0: [sda] CDB:
Jun 30 05:35:00 Bumblebee kernel: [ 5532.404427] Write(10): 2a 00 26 aa 44 00 00 04 00 00
Jun 30 05:35:00 Bumblebee kernel: [ 5532.404433] end_request: I/O error, dev sda, sector 648692736
Jun 30 05:35:00 Bumblebee kernel: [ 5532.409146] EXT4-fs warning (device sda1): ext4_end_bio:317: I/O error -5 writing to inode 12846129 (offset 16777216 size 8388608 starting block 81086720)
Jun 30 05:35:00 Bumblebee kernel: [ 5532.409243] sd 6:0:0:0: [sda] Unhandled error code
Jun 30 05:35:00 Bumblebee kernel: [ 5532.409245] sd 6:0:0:0: [sda]
Jun 30 05:35:00 Bumblebee kernel: [ 5532.409246] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jun 30 05:35:00 Bumblebee kernel: [ 5532.409248] sd 6:0:0:0: [sda] CDB:
Jun 30 05:35:00 Bumblebee kernel: [ 5532.409249] Write(10): 2a 00 26 aa 48 00 00 04 00 00
Jun 30 05:35:00 Bumblebee kernel: [ 5532.409255] end_request: I/O error, dev sda, sector 648693760
Jun 30 05:35:00 Bumblebee kernel: [ 5532.413879] EXT4-fs warning (device sda1): ext4_end_bio:317: I/O error -5 writing to inode 12846129 (offset 25165824 size 8388608 starting block 81086848)

Und noch viele mehr... Wenn ich die HDD nicht gerade getauscht hätte, hätte ich ja gesagt: Eindeutig, HDD kaputt. Aber nun frage ich mich: Was ist nun das Problem? Kann der SATA-Controller auf dem Mainboard kaputt sein? Vorher hingen die HDDs auch an einer Backplane, aktuell hängt sie direkt am Board, die Backplane kann es also nicht sein.

Hardware:
Server von Rackable Systems (müsste ungefähr der hier sein: http://www.ebay.com/itm/RACKABLE-SYSTEMS-2U-2x-2-33GHz-Dual-Core-4GB-RAM-4x-250GB-HDD-/171248157660)
2x Xeon X5355
8x 2 GB ECC RAM
aktuell 1x 2,5" 500GB HDD.
 
Kann man so sagen ;) Das Rack hat zwar ca. 20 Server, aber nur 1-2, die wirklich "wichtige" Dinge übernehmen (Routing, Firewall, DHCP, ...). Der Rest besteht aus diversen "Experimenten" von Studenten (Elektrotechnik, Mechatronik, Informatik), die sich größtenteils das erste Mal mit einem Server beschäftigen (können). Daher besteht deren Hardware meist aus schon etwas betagteren Maschinen verschiedenster Bauarten/Hersteller und es gibt daher auch keine allgemeingültigen Absicherungen für alle (USV, RAID, Backup).

Da müssen wir zwei, die das Rack betreuen, halt doch öfter mal bei kleinen "Katastrophen" der Nutzer aushelfen und uns manchmal auch in abenteuerliche Konfigurationen einarbeiten ;)

Aber so viel nur als Hintergrundinfos - an den betroffenen Server kam ich in den letzten zwei Tagen nicht heran (ohne den Besitzer fummel' ich da auch nicht dran herum), so dass ich momentan noch keine weiteren Dinge ausprobieren konnte.
 
  • ok1.de
  • ok2.de
  • thinkstore24.de
  • Preiswerte-IT - Gebrauchte Lenovo Notebooks kaufen

Werbung

Zurück
Oben