Предыстория.
Позавчера, в одном из московских датацентров, произошел сбой на UPSах, результате чего начались проблемы с массивом(разделами).
Конфиг.
Код: Выделить всё
FreeBSD 8.1-RELEASE #0: Tue Dec 14 01:18:30 MSK 2010
Контроллер adaptec 3405 пара дисков хитачи в рейд 1
Как оказалось, кешь дисков и контроллера, был включен.
После того как запустил сервер, отключил кеш на запись, везде где можно.
Код: Выделить всё
Jun 21 15:58:54 host kernel: aacd0: hard error cmd=write 105697311-105697342
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199703552, length=16384)]105697343-105697374error = 5
Jun 21 15:58:54 host kernel:
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199719936, length=16384)]105697375-105697406error = 5
Jun 21 15:58:54 host kernel:
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199736320, length=16384)]105697407-105697438error = 5
Jun 21 15:58:54 host kernel:
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199752704, length=16384)]105697439-105697470error = 5
Jun 21 15:58:54 host kernel:
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199769088, length=16384)]105697471-105697502error = 5
Jun 21 15:58:54 host kernel:
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199785472, length=16384)]105697503-105697534error = 5
Jun 21 15:58:54 host kernel:
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199801856, length=16384)]105697535-105697566error = 5
Jun 21 15:58:54 host kernel:
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199818240, length=16384)]105697567-105697598error = 5
Jun 21 15:58:54 host kernel:
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199834624, length=16384)]105697599-105697630error = 5
Jun 21 15:58:54 host kernel:
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199851008, length=16384)]105697631-105697662error = 5
Jun 21 15:58:54 host kernel:
Jun 21 15:58:54 host kernel: aacd0: hard error g_vfs_done():cmd=write aacd0s1d[WRITE(offset=26199867392, length=16384)]105697663-105697694error = 5
Подключили квм, на экране, обычное Mounting /etc/fstab filesystem failed
Запистил fsck -y. В консоль, циклично началось сыпатся сообщения подобного рода.
Код: Выделить всё
Jun 21 19:28:42 host kernel: aacd0: hard error cmd=read 334444767-334444798
Jun 21 19:28:42 host kernel: aacd0: hard error cmd=read fsbn 334444767
Jun 21 19:28:42 host kernel: aacd0: hard error cmd=read fsbn 334444768
Jun 21 19:28:42 host kernel: aacd0: hard error cmd=read fsbn 334444769
Jun 21 19:28:42 host kernel: aacd0: hard error cmd=read fsbn 334444770
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444771
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444772
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444773
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444774
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444775
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444776
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444777
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444778
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444779
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444780
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444781
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444782
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444783
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444784
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444785
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444786
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444787
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444788
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444789
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444790
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444791
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444792
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444793
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444794
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444795
Jun 21 19:28:43 host kernel: aacd0: hard error cmd=read fsbn 334444796
Jun 21 19:28:44 host kernel: aacd0: hard error cmd=read fsbn 334444797
Jun 21 19:28:44 host kernel: aacd0: hard error cmd=read fsbn 334444798
Jun 21 19:28:44 host kernel: aacd0: hard error cmd=read 334444767-334444798
Jun 21 19:28:44 host kernel: aacd0: hard error cmd=read fsbn 334444767
Ниже, вывод arcconf
Код: Выделить всё
/root/arcconf GETCONFIG 1
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Optimal
Channel description : SAS/SATA
Controller Model : Adaptec 3405
Controller Serial Number : 8C3910AD6A5
Physical Slot : 6
Temperature : 55 C/ 131 F (Normal)
Installed memory : 128 MB
Copyback : Disabled
Background consistency check : Disabled
Automatic Failover : Enabled
Global task priority : High
Stayawake period : Disabled
Spinup limit internal drives : 0
Spinup limit external drives : 0
Defunct disk drive count : 0
Logical devices/Failed/Degraded : 2/0/0
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS : 5.2-0 (15728)
Firmware : 5.2-0 (15728)
Driver : 2.1-9 (1)
Boot Flash : 5.2-0 (15728)
--------------------------------------------------------
Controller Battery Information
--------------------------------------------------------
Status : Not Installed
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name : Vol01
RAID level : 1
Status of logical device : Optimal
Size : 953334 MB
Read-cache mode : Enabled
Write-cache mode : Disabled (write-through)
Write-cache setting : Disabled (write-through)
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : Yes
Power settings : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0 : Present (0,0) STF604MH0UE9WB
Segment 1 : Present (0,1) STF604MH0W47KB
Logical device number 1
Logical device name : Vol02
RAID level : Simple_volume
Status of logical device : Optimal
Size : 953334 MB
Read-cache mode : Enabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back)
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : No
Failed stripes : No
Power settings : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0 : Present (0,2) WD-WMATV3932883
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,0(0:0)
Reported Location : Connector 0, Device 0
Vendor : Hitachi
Model : HDT721010SLA360
Firmware : ST6OA31B
Serial number : STF604MH0UE9WB
Size : 953869 MB
Write Cache : Disabled (write-through)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Disabled
Device #1
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,1(1:0)
Reported Location : Connector 0, Device 1
Vendor : Hitachi
Model : HDT721010SLA360
Firmware : ST6OA31B
Serial number : STF604MH0W47KB
Size : 953869 MB
Write Cache : Disabled (write-through)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Disabled
Device #2
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,2(2:0)
Reported Location : Connector 0, Device 2
Vendor : WDC
Model : WD1001FALS-00J7B
Firmware : 05.00K05
Serial number : WD-WMATV3932883
Size : 953869 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Disabled
Command completed successfully.