Xyratex RS-1220-F4-F5214E периодически теряет диски

Поломалось, посыпалось, не работает...

Модераторы: Trinity admin`s, Free-lance moderator`s

Ответить
abr_question
Junior member
Сообщения: 2
Зарегистрирован: 12 окт 2012, 12:43
Откуда: Краснодар

Xyratex RS-1220-F4-F5214E периодически теряет диски

Сообщение abr_question » 12 окт 2012, 13:33

Добрый день! Проблема описана в сабже. Теперь доп. данные:

2 контроллера (Upper, Lower)
Firmware Version: 3.5 Build 0022
StorView v3.10.0003

Диски: 16 SAS Seagate ST3146854SS Revision 0003 146 GB

Полка служит хранилищем для VMWare. На VMWare установлено 3 ОС (2 x Ubuntu 12.04 Server и одна WinXP)
2 RAID-массива:
1 и 2 слоты: RAID-1 (системы)
остальные: RAID-6 (данные)

Наблюдается Voltage: Failed на обоих контроллерах в свойствах 5V Protected и 5V Input (4.73 и 4.68 при минимальных 4.75 и 4.75 соответственно)

При мне проблемы начались с покупки SATA-II дисков Samsung ST1000DM005. Он их не видел сразу (в StorView слот пуст, на физическом слоте горят зелёный с оранжевым диодом одновременно), удавалось заставить их видеть с 10-го раза. Хотя, они определялись, как Seagate, но с характеристиками и S/N самсунга. RAID-5 на них создавался нормально, однако при создании на них виртуального диска VMVare на всё пространство, Xyratex терял их по-одному (в StorView пустые слоты) когда процесс создания виртуальных дисков не доходил и до 30%.

До меня были диски SATA-II Seagate Barracuda ES 750Gb, которые так же терялись раз в два дня.
А также есть диски SAS Seagate, описанные выше. Пока на VMWare устанавливались ОСи, СУБД и прочие примочки, полка вела себя спокойно. В данный момент ввели в експлуатацию, запустив сервер 1С, хранящий БД на нашей полке. Теперь два раза в день теряется один из дисков, причём абсолютно случайный.

Думаю поменять прошивки контроллеров, expansion. В одном мануале советуют даже прошивки дисков поменять.
Как вариант такого поведения рассматриваю недостаточное напряжение, которое не даёт достаточных 5V, но тут есть свои особенности. Дело в том, что оба блока питания полки подключены к разным UPS-ам. Оба выполняют стабилизацию напряжения. Тем более, когда только было принято решение разворачивать сервисы на VMWare ошибки напряжения не было вобще. Постепенно появилась ошибка 5V Protected, а счас добавилось щё и 5V Input. Ето при том, что реально работающих устройств стало меньше (отключены пользовательские компы ввиду уменьшения штата, отключены избыточные свитчи, выключены пара ненужных серверов).

Какие есть предположения помимо проверки напряжения в сети и обновления ПО? И с чего бы вы посоветовали начать решение проблемы?

Логи:

1.
----------
10/12/12 06:03:04 Configuration WWN: 20000050CC201669 Controller: 1 The writeback cache on StorageDB (Array 1) has been disabled. Reason(s): The array has become critical.
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x00, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x28, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x28, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 StorageDB (Array 1) is in a critical state.
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) (StorageDB (Array 1) Drive 1) has failed due to an unrecoverable error. Sense Data: 02/04/01.
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x28, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x28, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x28, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:02 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x28, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:01 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:01 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x28, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:01 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:01 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:01 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x2A, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:03:01 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x28, Status = 0x02 (02/04/01).
Error Message 10/12/12 06:02:56 Configuration WWN: 20000050CC201669 Controller: 0 A SAS command was aborted on the drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) for the SCSI Op Code 0x2A.
Error Message 10/12/12 06:02:56 Configuration WWN: 20000050CC201669 Controller: 0 A SAS command was aborted on the drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) for the SCSI Op Code 0x2A.
Error Message 10/12/12 06:02:56 Configuration WWN: 20000050CC201669 Controller: 0 A SAS command was aborted on the drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) for the SCSI Op Code 0x2A.
Information Message 10/12/12 06:02:56 Configuration WWN: 20000050CC201669 Controller: 0 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/12/12 06:02:56 Configuration WWN: 20000050CC201669 Controller: 0 A discovery process has started to determine all SAS devices on the SAS domain.
Information Message 10/12/12 06:02:54 Configuration WWN: 20000050CC201669 Controller: 1 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/12/12 06:02:54 Configuration WWN: 20000050CC201669 Controller: 1 A discovery process has started to determine all SAS devices on the SAS domain.
Information Message 10/12/12 06:02:53 Configuration WWN: 20000050CC201669 Controller: 1 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/12/12 06:02:53 Configuration WWN: 20000050CC201669 Controller: 1 A discovery process has started to determine all SAS devices on the SAS domain.
Information Message 10/12/12 06:02:53 Configuration WWN: 20000050CC201669 Controller: 0 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/12/12 06:02:53 Configuration WWN: 20000050CC201669 Controller: 0 A discovery process has started to determine all SAS devices on the SAS domain.
Error Message 10/12/12 06:02:53 Configuration WWN: 20000050CC201669 Controller: 0 The controller has generated a LIP on Drive Loop -1, due to a loop error.
----------

2.
----------
10/11/12 16:11:47 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C5000002ADB0 (Slot 8, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x00, Status = 0x02 (02/04/01).
Error Message 10/11/12 16:11:46 Configuration WWN: 20000050CC201669 Controller: 1 The drive w/ WWN 5000C5000002ADB0 (Slot 8, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x00, Status = 0x02 (02/04/01).
Error Message 10/11/12 16:11:46 Configuration WWN: 20000050CC201669 Controller: 1 The drive w/ WWN 5000C5000002ADB0 (Slot 8, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x00, Status = 0x02 (02/04/01).
Error Message 10/11/12 16:11:45 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C5000002ADB0 (Slot 8, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x00, Status = 0x02 (02/04/01).
Error Message 10/11/12 16:11:44 Configuration WWN: 20000050CC201669 Controller: 1 The drive w/ WWN 5000C5000002ADB0 (Slot 8, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x00, Status = 0x02 (02/04/01).
Error Message 10/11/12 16:11:44 Configuration WWN: 20000050CC201669 Controller: 1 The drive w/ WWN 5000C5000002ADB0 (Slot 8, Enclosure 1) returned a bad status while completing a command. SCSI Info: Operation = 0x00, Status = 0x02 (02/04/01).
Information Message 10/11/12 16:11:44 Configuration WWN: 20000050CC201669 Controller: 1 A drive w/ WWN 5000C5000002ADB0 (Slot 8, Enclosure 1) has been inserted.
Information Message 10/11/12 16:11:41 Configuration WWN: 20000050CC201669 Controller: 0 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:41 Configuration WWN: 20000050CC201669 Controller: 0 A discovery process has started to determine all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:40 Configuration WWN: 20000050CC201669 Controller: 1 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:40 Configuration WWN: 20000050CC201669 Controller: 1 A discovery process has started to determine all SAS devices on the SAS domain.
Warning Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 1 The writeback cache on StorageDB (Array 1) has been disabled. Reason(s): The array has become critical.
Error Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 1 StorageDB (Array 1) is in a critical state.
Error Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 1 The drive w/ WWN 5000C5000002ADB0 (Slot 8, Enclosure 1) (StorageDB (Array 1) Drive 2) has been marked as failed because it was removed.
Information Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 1 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 1 A discovery process has started to determine all SAS devices on the SAS domain.
Warning Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 0 The writeback cache on StorageDB (Array 1) has been disabled. Reason(s): The array has become critical.
Error Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 0 StorageDB (Array 1) is in a critical state.
Error Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 0 The drive w/ WWN 5000C5000002ADB0 (Slot 8, Enclosure 1) (StorageDB (Array 1) Drive 2) has been marked as failed because it was removed.
Error Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 0 A drive w/ WWN 5000C5000002ADB0 (Slot 8, Enclosure 1) has been removed.
Information Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 0 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:39 Configuration WWN: 20000050CC201669 Controller: 0 A discovery process has started to determine all SAS devices on the SAS domain.
Error Message 10/11/12 16:11:38 Configuration WWN: 20000050CC201669 Controller: 0 The controller has generated a LIP on Drive Loop -1, due to a loop error.
Error Message 10/11/12 16:11:36 Configuration WWN: 20000050CC201669 Controller: 0 A SAS command was aborted on the drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) for the SCSI Op Code 0x2A.
Error Message 10/11/12 16:11:36 Configuration WWN: 20000050CC201669 Controller: 0 A SAS command was aborted on the drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) for the SCSI Op Code 0x2A.
Error Message 10/11/12 16:11:36 Configuration WWN: 20000050CC201669 Controller: 0 A SAS command was aborted on the drive w/ WWN 5000C500000A157C (Slot 6, Enclosure 1) for the SCSI Op Code 0x2A.
Information Message 10/11/12 16:11:36 Configuration WWN: 20000050CC201669 Controller: 0 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:36 Configuration WWN: 20000050CC201669 Controller: 0 A discovery process has started to determine all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:34 Configuration WWN: 20000050CC201669 Controller: 1 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:34 Configuration WWN: 20000050CC201669 Controller: 1 A discovery process has started to determine all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:34 Configuration WWN: 20000050CC201669 Controller: 0 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:34 Configuration WWN: 20000050CC201669 Controller: 0 A discovery process has started to determine all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:33 Configuration WWN: 20000050CC201669 Controller: 1 The discovery process has completed identifying all SAS devices on the SAS domain.
Information Message 10/11/12 16:11:33 Configuration WWN: 20000050CC201669 Controller: 1 A discovery process has started to determine all SAS devices on the SAS domain.
Error Message 10/11/12 16:11:33 Configuration WWN: 20000050CC201669 Controller: 0 The controller has generated a LIP on Drive Loop -1, due to a loop error.
----------

abr_question
Junior member
Сообщения: 2
Зарегистрирован: 12 окт 2012, 12:43
Откуда: Краснодар

Re: Xyratex RS-1220-F4-F5214E периодически теряет диски

Сообщение abr_question » 21 окт 2012, 13:56

Решена проблема. Дело, действительно, оказалось в вышедшем из строя блоке питания. Переставил новые блоки и пока полёт нормальный уже неделю

Ответить

Вернуться в «Массивы - Технические вопросы, решение проблем.»

Кто сейчас на конференции

Сейчас этот форум просматривают: Bing [Bot] и 24 гостя