У нас построена система из 3 Xyratex 5412E (RS-1220-F4-5412E-2048-DL2), свитча Qlogic 5600, и 4х серверов.
В 3х серверах установлены HBA Qlogic 2560, в одном - 2460. НВА однопортовые, подключены к свичу.
В корзинке 2 контроллера по 2 порта каждый, все подключены к свичу.
На 2х серверах установлен CitrixXen Server 5.6, на других двух - FreeBSD amd64 8.x
До недавнего времени все было нормально, но под увеличивающейся нагрузкой стали происходить 'потери связи'.
При более-менее активной записи (~80Гб/мин.), что из под FreeBSD, что из под Citrix - отваливаются активные LUN'ы.
Причем, изначально был настроен multipath - отваливались сразу все 4 пути (это больше для эксперимента, все равно HBA 1 портовые), но и без него происходит тоже самое.
Выглядит это так (выдернуто из разных кусков логов):
FreeBSD
Код: Выделить всё
Aug 27 10:36:28 quattro kernel: (da27:isp0:0:3:1): lost device
Aug 27 10:36:28 quattro kernel: (da27:isp0:0:3:1): removing device entry
Aug 27 10:36:28 quattro kernel: (da28:isp0:0:3:2): lost device
Aug 27 10:36:28 quattro kernel: (da28:isp0:0:3:2): removing device entry
Aug 27 10:36:28 quattro kernel: (da29:isp0:0:3:3): lost device
Aug 27 10:36:28 quattro kernel: (da29:isp0:0:3:3): removing device entry
Aug 27 10:36:28 quattro kernel: (da30:isp0:0:3:4): lost device
Aug 27 10:36:28 quattro kernel: (da30:isp0:0:3:4): removing device entry
Aug 27 10:36:28 quattro kernel: (da31:isp0:0:3:5): lost device
Aug 27 10:36:28 quattro kernel: (da31:isp0:0:3:5): removing device entry
Aug 27 10:36:28 quattro kernel: (da32:GEOM_MULTIPATH: da32 orphaned in lun6isp0:0:
Aug 27 10:36:28 quattro kernel: GEOM_MULTIPATH: da32 removed from lun63:
Aug 27 10:36:28 quattro kernel: 6): GEOM_MULTIPATH: da32a orphaned in lun6lost device
Aug 27 10:36:28 quattro kernel: GEOM_MULTIPATH: da32a removed from lun6
Aug 27 10:36:28 quattro kernel: (da32:
Aug 27 10:36:28 quattro kernel: isp0:0:3:6): removing device entry
Aug 27 10:36:28 quattro kernel: (da33:GEOM_MULTIPATH: da33 orphaned in lun7isp0:0:
Aug 27 10:36:28 quattro kernel: GEOM_MULTIPATH: da33 removed from lun73:
Aug 27 10:36:28 quattro kernel: 7): lost device
Aug 27 10:36:28 quattro kernel: (da33:isp0:0:3:7): removing device entry
Aug 27 10:36:28 quattro kernel: (da34:GEOM_MULTIPATH: da34 orphaned in lun8isp0:0:
Aug 27 10:36:28 quattro kernel: GEOM_MULTIPATH: da34 removed from lun83:
Aug 27 10:36:28 quattro kernel: 8): lost device
Aug 27 10:36:28 quattro kernel: (da34:isp0:0:3:8): removing device entry
Aug 27 10:36:28 quattro kernel: (da35:GEOM_MULTIPATH: da35 orphaned in lun9isp0:0:
Aug 27 10:36:28 quattro kernel: GEOM_MULTIPATH: da35 removed from lun93:
Aug 27 10:36:28 quattro kernel: 9): lost device
Aug 27 10:36:28 quattro kernel: (da35:isp0:0:3:9): removing device entry
Код: Выделить всё
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdo, sector 32
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdo, sector 8001536
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdo, sector 16003072
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdt, sector 32
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdt, sector 8001536
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdt, sector 16003072
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sde, sector 32
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sde, sector 8001536
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sde, sector 16003072
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdj, sector 32
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sde, sector 16003072
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdj, sector 32
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdj, sector 8001536
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdj, sector 16003072
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdo, sector 32
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdo, sector 8001536
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdo, sector 16003072
Aug 26 16:10:21 due kernel: end_request: I/O error, dev sdt, sector 32
<....>
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at __tapdisk_vbd_complete_td_request: req 30: write 0x0020 secs to 0xaa0be720
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at vhd_complete: /dev/VG_XenStorage-62e67fd9-9029-bdad-2ea1-38ffd58c2ea3/VHD-a2811413-2e80-40ed-afeb-e3ffbbfeb6ae: op: 3, lsec: 2853281792, secs: 1, nbytes: 512, blk: 696602, blk_offset: 31839079
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at __tapdisk_vbd_complete_td_request: req 25: write 0x0020 secs to 0xaa11a4c0
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at vhd_complete: /dev/VG_XenStorage-62e67fd9-9029-bdad-2ea1-38ffd58c2ea3/VHD-a2811413-2e80-40ed-afeb-e3ffbbfeb6ae: op: 3, lsec: 2853658624, secs: 1, nbytes: 512, blk: 696694, blk_offset: 31843183
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at __tapdisk_vbd_complete_td_request: req 12: write 0x0020 secs to 0xaa176260
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at vhd_complete: /dev/VG_XenStorage-62e67fd9-9029-bdad-2ea1-38ffd58c2ea3/VHD-a2811413-2e80-40ed-afeb-e3ffbbfeb6ae: op: 2, lsec: 2849896992, secs: 32, nbytes: 16384, blk: 695775, blk_offset: 31802143
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at __tapdisk_vbd_complete_td_request: req 26: write 0x0020 secs to 0xa9ddfa20
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at vhd_complete: /dev/VG_XenStorage-62e67fd9-9029-bdad-2ea1-38ffd58c2ea3/VHD-a2811413-2e80-40ed-afeb-e3ffbbfeb6ae: op: 3, lsec: 2854035456, secs: 1, nbytes: 512, blk: 696786, blk_offset: 31851391
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at __tapdisk_vbd_complete_td_request: req 3: write 0x0020 secs to 0xaa1d2000
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at vhd_complete: /dev/VG_XenStorage-62e67fd9-9029-bdad-2ea1-38ffd58c2ea3/VHD-a2811413-2e80-40ed-afeb-e3ffbbfeb6ae: op: 3, lsec: 2854408192, secs: 1, nbytes: 512, blk: 696877, blk_offset: 31855495
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at __tapdisk_vbd_complete_td_request: req 28: write 0x0020 secs to 0xaa22dda0
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at vhd_complete: /dev/VG_XenStorage-62e67fd9-9029-bdad-2ea1-38ffd58c2ea3/VHD-a2811413-2e80-40ed-afeb-e3ffbbfeb6ae: op: 3, lsec: 2854785024, secs: 1, nbytes: 512, blk: 696969, blk_offset: 31859599
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at __tapdisk_vbd_complete_td_request: req 4: write 0x0020 secs to 0xaa289b40
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at vhd_complete: /dev/VG_XenStorage-62e67fd9-9029-bdad-2ea1-38ffd58c2ea3/VHD-a2811413-2e80-40ed-afeb-e3ffbbfeb6ae: op: 2, lsec: 2849144544, secs: 32, nbytes: 16384, blk: 695591, blk_offset: 31793935
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at __tapdisk_vbd_complete_td_request: req 27: write 0x0020 secs to 0xa9d27ee0
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at vhd_complete: /dev/VG_XenStorage-62e67fd9-9029-bdad-2ea1-38ffd58c2ea3/VHD-a2811413-2e80-40ed-afeb-e3ffbbfeb6ae: op: 3, lsec: 2855161856, secs: 1, nbytes: 512, blk: 697061, blk_offset: 31863703
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at __tapdisk_vbd_complete_td_request: req 13: write 0x0020 secs to 0xaa2e58e0
Aug 27 10:37:17 due TAPDISK[15905]: ERROR: errno -5 at vhd_complete: /dev/VG_XenStorage-62e67fd9-9029-bdad-2ea1-38ffd58c2ea3/VHD-a2811413-2e80-40ed-afeb-e3ffbbfeb6ae: op: 3, lsec: 2855538688, secs: 1, nbytes: 512, blk: 697153, blk_offset: 31867807
Код: Выделить всё
1 Fri Aug 27 10:31:01 MSD 2010 SANbox Port [8600.001E][Port: 7]PortID 0x10700 PortWWN 21:00:00:24:ff:03:c6:1a logged out of nameserver.
1 Fri Aug 27 10:31:01 MSD 2010 SANbox Port [8600.0020][Port: 7]SYNC_LOSS
1 Fri Aug 27 10:31:13 MSD 2010 SANbox Port [8600.001F][Port: 7]SYNC_ACQ
1 Fri Aug 27 10:31:13 MSD 2010 SANbox Port [8600.001D][Port: 7]PortID 0x10700 PortWWN 21:00:00:24:ff:03:c6:1a logged into nameserver.
1 Fri Aug 27 10:32:01 MSD 2010 SANbox Port [8600.001E][Port: 7]PortID 0x10700 PortWWN 21:00:00:24:ff:03:c6:1a logged out of nameserver.
1 Fri Aug 27 10:32:01 MSD 2010 SANbox Port [8600.0020][Port: 7]SYNC_LOSS
1 Fri Aug 27 10:32:02 MSD 2010 SANbox Port [8600.001F][Port: 7]SYNC_ACQ
1 Fri Aug 27 10:32:02 MSD 2010 SANbox Port [8600.001D][Port: 7]PortID 0x10700 PortWWN 21:00:00:24:ff:03:c6:1a logged into nameserver.
1 Fri Aug 27 10:36:20 MSD 2010 SANbox Port [8600.001E][Port: 3]PortID 0x10300 PortWWN 22:00:00:50:cc:20:57:ae logged out of nameserver.
1 Fri Aug 27 10:36:20 MSD 2010 SANbox Port [8600.0020][Port: 3]SYNC_LOSS
1 Fri Aug 27 10:36:27 MSD 2010 SANbox Port [8600.001F][Port: 3]SYNC_ACQ
1 Fri Aug 27 10:36:27 MSD 2010 SANbox Port [8600.001D][Port: 3]PortID 0x10300 PortWWN 21:00:00:1b:32:11:07:f8 logged into nameserver.
1 Fri Aug 27 10:36:37 MSD 2010 SANbox Port [8600.001E][Port: 3]PortID 0x10300 PortWWN 21:00:00:1b:32:11:07:f8 logged out of nameserver.
Естественно, все это приводит к самым печальным последствиям.
Может быть кто-то сталкивался с такой проблемой и подскажет где найти и как устранить причину проблемы...
Заранее благодарен,
Руслан.