服务器磁盘坏道检查

2024-07-22 737 0

起因:cd 或 ls 文件路径 响应很慢很慢,结束相关进程问题依旧

gitlab挂掉(硬盘响应慢)

root@demo18:~# tail  /var/log/gitlab/unicorn/unicorn_stderr.log -f
I, [2024-07-22T13:36:25.674206 #41065]  INFO -- : master complete
I, [2024-07-22T13:36:27.064958 #47425]  INFO -- : Refreshing Gem list
I, [2024-07-22T13:36:45.467352 #47425]  INFO -- : listening on addr=127.0.0.1:8181 fd=17
I, [2024-07-22T13:36:45.467488 #47425]  INFO -- : unlinking existing socket=/var/opt/gitlab/gitlab-rails/sockets/gitlab.socket
I, [2024-07-22T13:36:45.467569 #47425]  INFO -- : listening on addr=/var/opt/gitlab/gitlab-rails/sockets/gitlab.socket fd=18
I, [2024-07-22T13:36:45.523851 #47594]  INFO -- : worker=0 ready
I, [2024-07-22T13:36:45.527721 #47597]  INFO -- : worker=1 ready
I, [2024-07-22T13:36:45.533209 #47600]  INFO -- : worker=2 ready
I, [2024-07-22T13:36:45.534730 #47425]  INFO -- : master process ready
I, [2024-07-22T13:36:45.538788 #47603]  INFO -- : worker=3 ready
E, [2024-07-22T13:37:54.549813 #47425] ERROR -- : worker=3 PID:47603 timeout (61s > 60s), killing
E, [2024-07-22T13:37:54.580907 #47425] ERROR -- : reaped #<Process::Status: pid 47603 SIGKILL (signal 9)> worker=3
I, [2024-07-22T13:37:54.594292 #48375]  INFO -- : worker=3 ready
E, [2024-07-22T13:38:03.600771 #47425] ERROR -- : worker=0 PID:47594 timeout (61s > 60s), killing
E, [2024-07-22T13:38:03.629095 #47425] ERROR -- : reaped #<Process::Status: pid 47594 SIGKILL (signal 9)> worker=0
I, [2024-07-22T13:38:03.641272 #48485]  INFO -- : worker=0 ready
E, [2024-07-22T13:38:05.640754 #47425] ERROR -- : worker=2 PID:47600 timeout (61s > 60s), killing
E, [2024-07-22T13:38:05.668951 #47425] ERROR -- : reaped #<Process::Status: pid 47600 SIGKILL (signal 9)> worker=2
I, [2024-07-22T13:38:05.682336 #48492]  INFO -- : worker=2 ready
E, [2024-07-22T13:38:38.686827 #47425] ERROR -- : worker=1 PID:47597 timeout (61s > 60s), killing
E, [2024-07-22T13:38:38.712657 #47425] ERROR -- : reaped #<Process::Status: pid 47597 SIGKILL (signal 9)> worker=1

HDD/SSD

MegaCli -PDlist -aAll | grep -i "Media Error Count"
MegaCli -PDlist -aAll | grep -i "Predictive Failure Count"
MegaCli -PDlist -aAll | grep -i "Other Error Count"

perccli /c0/eall/sall show all  | grep -i "Media Error Count"
perccli /c0/eall/sall show all  | grep -i "Predictive Failure Count"
perccli /c0/eall/sall show all  | grep -i "Other Error Count"

perccli /c0 show termlog > /tmp/raid/termlog.txt
perccli /c0 show events > /tmp/raid/events.txt
perccli /c0 show all > /tmp/raid/all.txt
perccli /c0 show alilog> /tmp/raid/alilog.txt
perccli /c0/vall  show all> /tmp/raid/vall.txt

查看是否有不为0的,如何有执行下面语句

MegaCli -PDlist -aAll | grep -i "Media Error Count"
MegaCli -PDlist -aAll | grep -i "Predictive Failure Count"
MegaCli -PDlist -aAll | grep -i "Other Error Count"

经检查4位硬盘有坏道
gitlab使用pull速度,push依旧卡顿,此服务器raid5, 现热拔掉4位硬盘,push推送速度正常

root@demo18:/data/shell# MegaCli -PDlist -aAll | grep -i "Media Error Count"
Media Error Count: 0
Media Error Count: 0
Media Error Count: 0
Media Error Count: 0
Media Error Count: 8831
Media Error Count: 0
Media Error Count: 0
Media Error Count: 0

root@demo18:~# MegaCli -PDInfo -PhysDrv [32:4] -aAll

Enclosure Device ID: 32
Slot Number: 4
Drive's position: DiskGroup: 0, Span: 0, Arm: 4
Enclosure position: 1
Device Id: 4
WWN: 500003972bd811bd
Sequence Number: 2
Media Error Count: 8945
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 5.458 TB [0x2baa0f4b0 Sectors]
Non Coerced Size: 5.457 TB [0x2ba90f4b0 Sectors]
Coerced Size: 5.457 TB [0x2ba900000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  4096
Firmware state: Online, Spun Up
Device Firmware Level: FS6D
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x4433221105000000
Connected Port Number: 7(path0) 
Inquiry Data:         768EK28EFE1CTOSHIBA MG04ACA600E                         FS6D
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :36C (96.80 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No

Media Error Count:磁盘存在错误,可能是磁盘有坏道。值越大,越危险。根据磁盘状况,一般大于0报修更换, 因为只要一坏后面数值就涨上去了。
Predictive Failure Count:磁盘的预警数。一般大于0,就报修更换。
Last Predictive Failure Event Seq Number:最后一条预警的时间序列号,这个值不为0,肯定Predictive FailureCount也不为0。

NVME

nvme list
nvme smart-log /dev/nvme0n1

微信截图_20240722140701

在这里插入图片描述

微信截图_20240722140741

相关文章

win11终端配置 ubuntu trzsz-go trz/tsz 上传/下载
UFW+IPSET 禁用非法IP
ip_local_port_range: prefer different parity for start/end values
Linux Sudo 权限配置
阿里云ECS云盘IOPS压测
nextcloud preview-generate 报错文件无权限解决

发布评论