Netapp存储无法正常工作导致小机数据库无法连接
1.使用sysconfig -r查看系统状态硬盘状态
SBJYJ-02> sysconfig –r
Aggregate aggr0 (online, raid_dp, degraded, hybrid_enabled) (block checksums)Plex /aggr0/plex0 (online, normal, active)RAID group /aggr0/plex0/rg0 (degraded, block checksums)RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)--------- ------ ------------- ---- ---- ---- ----- -------------- --------------dparity 0b.10.1 0b 10 1 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 parity 0b.20.1 0b 20 1 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.20.17 0b 20 17 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.10.3 0b 10 3 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.20.3 0b 20 3 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 4c.30.3 4c 30 3 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.10.5 0b 10 5 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.20.5 0b 20 5 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 4c.30.5 4c 30 5 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.10.7 0b 10 7 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data FAILED N/A 560000/ -data 4c.30.7 4c 30 7 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.10.9 0b 10 9 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.20.9 0b 20 9 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 4c.30.9 4c 30 9 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.10.11 0b 10 11 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.20.11 0b 20 11 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 RAID group /aggr0/plex0/rg1 (double degraded, block checksums)RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)--------- ------ ------------- ---- ---- ---- ----- -------------- --------------dparity 4c.30.11 4c 30 11 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688 parity 0b.20.13 0b 20 13 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.10.13 0b 10 13 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 4c.30.13 4c 30 13 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.20.15 0b 20 15 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data FAILED N/A 560000/ -data 4c.30.15 4c 30 15 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.10.23 0b 10 23 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 4c.30.1 4c 30 1 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096 data 4c.30.17 4c 30 17 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.20.19 0b 20 19 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data FAILED N/A 560000/ -data 4c.30.19 4c 30 19 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.20.21 0b 20 21 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.10.21 0b 10 21 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 data 0b.10.15 0b 10 15 SA:B 0 SAS 15000 560000/1146880000 560879/1148681096 data 0b.20.23 0b 20 23 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688 Pool1 spare disks (empty)Pool0 spare disksRAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.1 0a 0 1 SA:B 0 SSD N/A 190532/390209536 190782/390721968
spare 0a.00.3 0a 0 3 SA:B 0 SSD N/A 190532/390209536 190782/390721968 Broken disksRAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
failed 0b.10.17 0b 10 17 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
failed 0b.10.19 0b 10 19 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
failed 0b.20.7 0b 20 7 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
failed 4c.30.23 4c 30 23 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688 Partner disksRAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
partner 0b.10.22 0b 10 22 SA:B 0 SAS 15000 560000/1146880000 560879/1148681096
partner 4c.30.21 4c 30 21 SA:A 0 SAS 15000 0/0 560879/1148681096
partner 4c.30.22 4c 30 22 SA:A 0 SAS 15000 0/0 560879/1148681096
partner 4c.20.10 4c 20 10 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.2 0b 10 2 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.16 0b 10 16 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.20 0b 10 20 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.0 0b 10 0 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.18 0b 10 18 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.8 0b 10 8 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.14 4c 30 14 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.12 4c 20 12 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.6 4c 30 6 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.6 4c 20 6 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.20 4c 30 20 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.2 4c 20 2 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.16 4c 20 16 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.0 4c 30 0 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.16 4c 30 16 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.8 4c 20 8 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.2 4c 30 2 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.12 0b 10 12 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.14 0b 10 14 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.10 0b 10 10 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.4 0b 10 4 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.6 0b 10 6 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.14 4c 20 14 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.22 4c 20 22 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.12 4c 30 12 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.20 4c 20 20 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.18 4c 20 18 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.10 4c 30 10 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.18 4c 30 18 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.0 4c 20 0 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.4 4c 20 4 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.4 4c 30 4 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.8 4c 30 8 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4a.00.2 4a 0 2 SA:A 0 SSD N/A 0/0 190782/390721968
partner 4a.00.0 4a 0 0 SA:A 0 SSD N/A 0/0 190782/390721968
看到存储做了raid dp 初步断定硬盘损坏,数据不受影响
由以上日志可看出,坏盘一共四块,坏在两个raid组中,vol status -r 命令查看热备盘已用完,无法进行坏硬盘更换,而下方硬盘没有被控制器二所接管,无法看到硬盘具体状态。
2.更换控制器二所属的硬盘
①将坏掉的硬盘拔出,等待30秒(防止磁盘断电后还在转动,防止磁盘造成物理损坏为数据恢复增加困难),插入新硬盘(确认黄灯亮、绿灯不闪烁)
②插入硬盘后disk show -v 查看新硬盘是否分配了owner(控制器)
DISK OWNER POOL SERIAL NUMBER DR HOME CHKSUM
------------ ------------- ----- ------------- ------------- -------
4a.00.0 esad (22312421) Pool0 S142NEAD806058 SBJYJ-01 (2017242430) Block
4a.00.2 SBJYJ-01 (2017242430) Pool0 S142NEAD803850 SBJYJ-01 (2017242430) Block发现新插入的硬盘不属于此控制器,带有其他控制器的信息或raid信息
使用disk assign -f <disk_id> -s <owner_id> 强制分配给一个控制器 *慎用,分配完成后使用disk show –v查看是否分配成功
使用aggr destroy <aggr名称> 删除一个AGGR *慎用③使用vol status -r 查看硬盘状态,如果硬盘为Bad Label,执行步骤(1);如果硬盘已经进入spare disks中,并且磁盘最后标注了not zerod执行步骤(2)
(1) 在vol status –r中看到带有bad label标签的盘,但是已经将新的硬盘安装上
先priv set advanced 进入高级模式,使用 disk unfail -s <硬盘id 0b.**.**> 去除Bad标签,再退出高级模式priv set
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
failed 0b.10.19 0b 10 19 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
failed 0b.20.7 0b 20 7 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
failed 4c.30.23 4c 30 23 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
bad label 0b.10.17 0b 10 17 SA:B 0 SAS 15000 560000/1146880000 560879/1148681096
(2) 当热备盘中硬盘后跟not zerod,RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.1 0a 0 1 SA:B 0 SSD N/A 190532/390209536 190782/390721968 not zerod
spare 0a.00.3 0a 0 3 SA:B 0 SSD N/A 190532/390209536 190782/390721968
使用命令disk zero spares 初始化所有spare disk
拉起控制器一
控制器一因硬盘损坏,进行保护硬盘,自己关闭了控制器,防止用户继续访问存储造成硬盘的继续损坏导致数据丢失
使用console线直连存储,会进入LOADER>模式,使用help命令查看可用命令,使用boot_ontap命令强制控制器启动(若无法启动,则控制器可能损坏),拉起控制器后进入控制器更换硬盘即可
总结
存储进行重构、初始化需要一段时间,当存储硬盘重构、初始化完成后raid组自动将降级取消,存储正常运行
参考命令
sysconfig -v 查看存储状态
sysconfig -r 查看存储硬盘状态
sysconfig -a 查看系统信息详情
vol status -v 查看volume状态
vol status -f 检查是否有故障硬盘
disk show 查看磁盘分配信息
disk show -v 查看硬盘所属控制器
storage show disk -p 查看硬盘位置
disk zero spares 初始化所有spare disk
environment status 检查电源、风扇状态
rdfile /etc/messages 检查最新的日志
cf status 检查控制器状态
df 检查逻辑卷磁盘使用率
environment status 查看环境信息
license 查看许可信息
ifconfig -a 查看网络配置
aggr status 查看raid组信息
aggr status -r 查看raid组详情df -Vh 查看卷空间
df -Ah 查看aggr空间disk assign -f <disk_id> -s <owner_id> 强制分配给一个控制器 *慎用
disk replace start disk_name spare_disk_name 使用spare disk 替换一块磁盘 *慎用
disk replace stop disk_name 停止替换硬盘 *慎用
disk sanitize start disk_name 将磁盘上面所有数据移除 *慎用
disk sanitize abort disk_name 停止 *慎用
aggr destroy aggrname 删除一个AGGR *慎用
disk remove onwership disk_name 删除硬盘owner *慎用