Столкнулся со следующей проблемой.
Имеется контроллер Adaptec RAID 3805, к которому подключено 7 дисков по 1Tb, собранные в RAID5. Батарейки (бабуина) нет, кэши записи на дисках и на контроллере - отключены.
Сразу после установки контроллера начались проблемы с вылетом из массива одного из дисков. Диск вылетает, затем начинается ребилдинг и массив возвращается в нормальное состояние. Происходило это примерно 1 раз в месяц, после чего было принято решение заменить данный диск. После замены массив проработал еще где-то 2-3 месяца, и сейчас диск опять вылетел.
Вот часть лога:
Код: Выделить всё
Device event log for controller 1
Vendor/Model S/N (20 chars max) WWN (SAS only) Parity errors Link errors HW errors Cmd aborts Medium errors SMART error SMART warnings
ST310003 33AS 9TE16JTS 0000000000000000 0 0 0 1720 0 false 0
ST310003 33AS 6TE0HMPZ 0000000000000000 0 0 0 1133 0 false 0
ST310003 33AS 5TE0EX65 0000000000000000 0 0 0 1493 0 false 0
WDC WD10EARS WD-WMAV50880208 0000000000000000 0 0 0 1124 0 false 0
ST310003 33AS 6TE0GK0D 0000000000000000 0 0 0 1884 0 false 0
WDC WD10EARS WD-WMAV50881597 0000000000000000 0 0 0 1004 0 false 0
ST1000VX 000-1CU1 Z1D98ZVR 0000000000000000 0 0 0 174 0 false 0
Defunct drive event log for controller 1
Date and time Vendor/Model S/N (20 chars max) WWN (SAS only) Failure code Description
February 05, 2014 8:53:05 AM MSK ST310003 33AS 6TE0HMPZ 0000000000000000 0x2 Selection timeout: device removed or not responding
February 15, 2014 9:06:19 AM MSK ST310003 33AS 6TE0HMPZ 0000000000000000 0x2 Selection timeout: device removed or not responding
February 22, 2014 12:33:13 AM MSK ST310003 33AS 6TE0HMPZ 0000000000000000 0x2 Selection timeout: device removed or not responding
February 27, 2014 10:28:44 AM MSK ST310003 33AS 6TE0HMPZ 0000000000000000 0x2 Selection timeout: device removed or not responding
March 04, 2014 1:35:55 PM MSK ST310003 33AS 6TE0HMPZ 0000000000000000 0x2 Selection timeout: device removed or not responding
March 26, 2014 2:10:45 AM MSK ST310003 33AS 6TE0HMPZ 0000000000000000 0x2 Selection timeout: device removed or not responding
April 30, 2014 3:51:04 AM MSD ST310003 33AS 6TE0HMPZ 0000000000000000 0x2 Selection timeout: device removed or not responding
June 25, 2014 7:58:54 AM MSD ST1000VX 000-1CU1 Z1D98ZVR 0000000000000000 0x2 Selection timeout: device removed or not responding
Soft event log for controller 1
May 31, 2014 7:43:35 AM MSD INF PPI update. Age 469
May 31, 2014 7:43:48 AM MSD ERR Battery has degraded to the dead state: controller 1
June 3, 2014 2:18:41 AM MSD INF Container changed: controller 1, logical device 0
June 3, 2014 2:18:42 AM MSD INF Container changed: controller 1, logical device 0
June 3, 2014 2:12:59 AM MSD INF PPI update. Age 470
June 3, 2014 2:13:08 AM MSD INF PPI update. Age 471
June 3, 2014 2:13:21 AM MSD ERR Battery has degraded to the dead state: controller 1
June 14, 2014 1:39:33 PM MSD INF Container changed: controller 1, logical device 0
June 14, 2014 1:38:01 PM MSD INF PPI update. Age 472
June 14, 2014 1:38:10 PM MSD INF PPI update. Age 473
June 14, 2014 1:38:23 PM MSD ERR Battery has degraded to the dead state: controller 1
June 25, 2014 7:58:57 AM MSD INF New device found: controller 1, channel 0, SCSI device ID 1, LUN 0
June 25, 2014 7:58:57 AM MSD WRN An error occurred while accessing the logical device: controller 1, logical device 0
June 25, 2014 7:58:57 AM MSD ERR Drive in a RAID-5 set failed: controller 1, logical device 0
June 25, 2014 7:58:57 AM MSD ERR Disk failed: controller 1, channel 0, SCSI device ID 1
June 25, 2014 7:58:57 AM MSD INF Drive removed: controller 1, channel 0, SCSI device ID 1
June 25, 2014 7:58:58 AM MSD INF Drive inserted: controller 1, channel 0, SCSI device ID 1
June 25, 2014 7:58:58 AM MSD INF Container changed: controller 1, logical device 0
June 25, 2014 7:58:59 AM MSD WRN RAID-5 failover operation failed because there are no failover devices assigned to this RAID-5 set: controller 1, logical device 0
June 25, 2014 7:59:20 AM MSD ERR Disk failed: controller 1, channel 0, SCSI device ID 1
June 25, 2014 7:59:21 AM MSD INF PPI update. Age 474
June 25, 2014 7:59:23 AM MSD INF PPI update. Age 475
June 25, 2014 7:59:24 AM MSD INF PPI update. Age 476
June 25, 2014 7:59:24 AM MSD INF Configuration has changed.
June 25, 2014 7:59:25 AM MSD INF Failover disk changed: controller 1, logical device 0
June 25, 2014 7:59:25 AM MSD INF Failover and rebuild operation started on a RAID-5 set: controller 1, logical device 0
June 25, 2014 7:59:25 AM MSD INF Container changed: controller 1, logical device 0
June 25, 2014 7:59:27 AM MSD INF Configuration has changed.
ST1000VX 000-1CU1 - новый диск
Из лога видно, что симптомы те же, что и у старого диска. Вылетает по таймауту. Смарт у обоих дисков отличный. Смущают только "Cmd aborts" в логе, не знаю насколько это нормально, события на разных дисках, но вылетает только один.
Что можно предпринять, чтобы наверняка узнать, в чем может быть проблема?