[arch-general] btrfs raid 10 fileserver with ata errors
niya levi
niyalevi at gmail.com
Sat Jan 14 09:39:01 UTC 2017
> after some googling it's been suggested that it's either a hard drive,
> the sata controller or the sata cables.
> how do i go about diagnosing and fixing the problem,
> any suggestions or guidance would be appreciated.
> shadrock
> I've had this problem before. IIRC, you can match up the ata17.00 with what drive it's talking about by looking at your kernel boot messages. The first thing I would do is switch out the SATA cable and see if the problem persists. If that doesn't work, run a scan of the drive using the manufacturers scan program.
>
hi everyone
these are the following tests i've tried and the results
journalctl -f | grep ata
Jan 13 12:37:13 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:37:13 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:37:13 maybel kernel: ata17.00: cmd
25/00:00:00:24:7d/00:07:1a:00:00/e0 tag 13 dma 917504 in
Jan 13 12:37:13 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:37:13 maybel kernel: ata17: hard resetting link
Jan 13 12:37:13 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:37:13 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:37:13 maybel kernel: ata17: EH complete
Jan 13 12:37:45 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:37:45 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:37:45 maybel kernel: ata17.00: cmd
25/00:00:00:d9:7d/00:07:1a:00:00/e0 tag 25 dma 917504 in
Jan 13 12:37:45 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:37:45 maybel kernel: ata17: hard resetting link
Jan 13 12:37:46 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:37:46 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:37:46 maybel kernel: ata17: EH complete
Jan 13 12:38:19 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:38:19 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:38:19 maybel kernel: ata17.00: cmd
25/00:00:80:d6:81/00:06:1a:00:00/e0 tag 1 dma 786432 in
Jan 13 12:38:19 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:38:19 maybel kernel: ata17: hard resetting link
Jan 13 12:38:20 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:38:20 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:38:20 maybel kernel: ata17: EH complete
Jan 13 12:38:52 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:38:52 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:38:52 maybel kernel: ata17.00: cmd
25/00:80:80:d1:82/00:05:1a:00:00/e0 tag 28 dma 720896 in
Jan 13 12:38:52 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:38:52 maybel kernel: ata17: hard resetting link
Jan 13 12:38:52 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:38:52 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:38:52 maybel kernel: ata17: EH complete
Jan 13 12:39:24 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:39:24 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:39:24 maybel kernel: ata17.00: cmd
25/00:00:00:9d:84/00:05:1a:00:00/e0 tag 1 dma 655360 in
Jan 13 12:39:24 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:39:24 maybel kernel: ata17: hard resetting link
Jan 13 12:39:25 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:39:25 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:39:25 maybel kernel: ata17: EH complete
Jan 13 12:39:57 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:39:57 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:39:57 maybel kernel: ata17.00: cmd
25/00:00:80:b8:85/00:05:1a:00:00/e0 tag 6 dma 655360 in
Jan 13 12:39:57 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:39:57 maybel kernel: ata17: hard resetting link
Jan 13 12:39:57 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:39:57 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:39:57 maybel kernel: ata17: EH complete
Jan 13 12:40:29 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:40:29 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:40:29 maybel kernel: ata17.00: cmd
25/00:80:80:cd:85/00:05:1a:00:00/e0 tag 16 dma 720896 in
Jan 13 12:40:29 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:40:29 maybel kernel: ata17: hard resetting link
Jan 13 12:40:30 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:40:30 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:40:30 maybel kernel: ata17: EH complete
Jan 13 12:41:02 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:41:02 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:41:02 maybel kernel: ata17.00: cmd
25/00:80:80:f2:85/00:05:1a:00:00/e0 tag 16 dma 720896 in
Jan 13 12:41:02 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:41:02 maybel kernel: ata17: hard resetting link
Jan 13 12:41:02 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:41:02 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:41:02 maybel kernel: ata17: EH complete
^C
=======================================================================================================
[alarm at maybel ~]$ ls -l /sys/block/ | grep sd.
lrwxrwxrwx 1 root root 0 Jan 9 14:21 sda ->
../devices/pci0000:00/0000:00:05.0/ata1/host0/target0:0:0/0:0:0:0/block/sda
lrwxrwxrwx 1 root root 0 Jan 9 14:26 sdb ->
../devices/pci0000:00/0000:00:0c.0/0000:02:00.0/ata13/host12/target12:0:0/12:0:0:0/block/sdb
lrwxrwxrwx 1 root root 0 Jan 9 14:26 sdc ->
../devices/pci0000:00/0000:00:0c.0/0000:02:00.0/ata14/host13/target13:0:0/13:0:0:0/block/sdc
lrwxrwxrwx 1 root root 0 Jan 9 14:26 sdd ->
../devices/pci0000:00/0000:00:0c.0/0000:02:00.0/ata15/host14/target14:0:0/14:0:0:0/block/sdd
lrwxrwxrwx 1 root root 0 Jan 9 14:26 sde ->
../devices/pci0000:00/0000:00:0c.0/0000:02:00.0/ata16/host15/target15:0:0/15:0:0:0/block/sde
lrwxrwxrwx 1 root root 0 Jan 9 14:26 sdf ->
../devices/pci0000:00/0000:00:0d.0/0000:03:00.0/ata17/host16/target16:0:0/16:0:0:0/block/sdf
lrwxrwxrwx 1 root root 0 Jan 9 14:26 sdg ->
../devices/pci0000:00/0000:00:0d.0/0000:03:00.0/ata18/host17/target17:0:0/17:0:0:0/block/sdg
lrwxrwxrwx 1 root root 0 Jan 10 02:43 sdh ->
../devices/pci0000:00/0000:00:02.1/usb1/1-6/1-6:1.0/host20/target20:0:0/20:0:0:0/block/sdh
=======================================================================================================
sudo smartctl -i /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.13-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: HITACHI HUA722010ALA330
Serial Number: N136GXML
LU WWN Device Id: 5 000cca 39ced38c2
Firmware Version: JP4ONA00
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 1.5 Gb/s
Local Time is: Fri Jan 13 12:59:51 2017 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=======================================================================================================
sudo smartctl -t short /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.13-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
off-line mode".
Drive command "Execute SMART Short self-test routine immediately in
off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Fri Jan 13 13:31:48 2017
Use smartctl -X to abort test.
=======================================================================================================
sudo smartctl -a /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.13-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: HITACHI HUA722010ALA330
Serial Number: N136GXML
LU WWN Device Id: 5 000cca 39ced38c2
Firmware Version: JP4ONA00
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 1.5 Gb/s
Local Time is: Fri Jan 13 13:34:00 2017 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x85) Offline data collection activity
was aborted by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 41) The self-test routine was
interrupted
by the host with a hard or soft reset.
Total time to complete Offline
data collection: ( 9929) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 166) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail
Always - 0
2 Throughput_Performance 0x0005 137 137 054 Pre-fail
Offline - 91
3 Spin_Up_Time 0x0007 130 130 024 Pre-fail
Always - 278 (Average 305)
4 Start_Stop_Count 0x0012 100 100 000 Old_age
Always - 69
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail
Always - 0
8 Seek_Time_Performance 0x0005 138 138 020 Pre-fail
Offline - 31
9 Power_On_Hours 0x0012 099 099 000 Old_age
Always - 10782
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 68
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 123
193 Load_Cycle_Count 0x0012 100 100 000 Old_age
Always - 123
194 Temperature_Celsius 0x0002 193 193 000 Old_age
Always - 31 (Min/Max 12/46)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age
Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age
Always - 0
SMART Error Log Version: 0
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Interrupted (host reset) 90%
10782 -
# 2 Short offline Interrupted (host reset) 90%
10776 -
# 3 Short offline Interrupted (host reset) 90%
10752 -
# 4 Short offline Interrupted (host reset) 90%
10728 -
# 5 Short offline Completed without error 00%
10703 -
# 6 Short offline Interrupted (host reset) 90%
10656 -
# 7 Short offline Interrupted (host reset) 90%
10632 -
# 8 Extended offline Interrupted (host reset) 90%
10628 -
# 9 Short offline Interrupted (host reset) 90%
10608 -
#10 Short offline Interrupted (host reset) 90%
10584 -
#11 Short offline Interrupted (host reset) 90%
10560 -
#12 Short offline Interrupted (host reset) 90%
10537 -
#13 Short offline Interrupted (host reset) 90%
10513 -
#14 Short offline Interrupted (host reset) 90%
10489 -
#15 Short offline Interrupted (host reset) 90%
10465 -
#16 Extended offline Interrupted (host reset) 90%
10461 -
#17 Short offline Interrupted (host reset) 90%
10441 -
#18 Short offline Interrupted (host reset) 90%
10417 -
#19 Short offline Interrupted (host reset) 90%
10393 -
#20 Short offline Completed without error 00%
10368 -
#21 Short offline Completed without error 00%
10344 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
lspci | grep SATA
00:05.0 IDE interface: NVIDIA Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: NVIDIA Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: NVIDIA Corporation MCP55 SATA Controller (rev a3)
02:00.0 SATA controller: Marvell Technology Group Ltd. Device 9215 (rev 11)
03:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE
Controller (rev 03)
03:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE
Controller (rev 03)
04:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE
Controller (rev 03)
04:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE
Controller (rev 03)
=======================================================================================================
sudo lshw -c storage
*-usb
description: Mass storage device
product: AS2105
vendor: ASMedia
physical id: 6
bus info: usb at 1:6
version: 0.01
serial: WD-WCC4N5AHPU2D
capabilities: usb-2.10 scsi
configuration: driver=usb-storage speed=480Mbit/s
*-ide:0
description: IDE interface
product: MCP55 IDE
vendor: NVIDIA Corporation
physical id: 7
bus info: pci at 0000:00:04.0
version: a1
width: 32 bits
clock: 66MHz
capabilities: ide pm bus_master cap_list
configuration: driver=pata_amd latency=0 maxlatency=1 mingnt=3
resources: irq:0 ioport:1f0(size=8) ioport:3f6 ioport:170(size=8)
ioport:376 ioport:f000(size=16)
*-ide:1
description: IDE interface
product: MCP55 SATA Controller
vendor: NVIDIA Corporation
physical id: 5
bus info: pci at 0000:00:05.0
version: a3
width: 32 bits
clock: 66MHz
capabilities: ide pm msi ht bus_master cap_list
configuration: driver=sata_nv latency=0 maxlatency=1 mingnt=3
resources: irq:21 ioport:9f0(size=8) ioport:bf0(size=4)
ioport:970(size=8) ioport:b70(size=4) ioport:dc00(size=16)
memory:fe02d000-fe02dfff
*-ide:2
description: IDE interface
product: MCP55 SATA Controller
vendor: NVIDIA Corporation
physical id: 5.1
bus info: pci at 0000:00:05.1
version: a3
width: 32 bits
clock: 66MHz
capabilities: ide pm msi ht bus_master cap_list
configuration: driver=sata_nv latency=0 maxlatency=1 mingnt=3
resources: irq:20 ioport:9e0(size=8) ioport:be0(size=4)
ioport:960(size=8) ioport:b60(size=4) ioport:c800(size=16)
memory:fe02c000-fe02cfff
*-ide:3
description: IDE interface
product: MCP55 SATA Controller
vendor: NVIDIA Corporation
physical id: 5.2
bus info: pci at 0000:00:05.2
version: a3
width: 32 bits
clock: 66MHz
capabilities: ide pm msi ht bus_master cap_list
configuration: driver=sata_nv latency=0 maxlatency=1 mingnt=3
resources: irq:23 ioport:c400(size=8) ioport:c000(size=4)
ioport:bc00(size=8) ioport:b800(size=4) ioport:b400(size=16)
memory:fe02b000-fe02bfff
*-storage
description: SATA controller
product: Marvell Technology Group Ltd.
vendor: Marvell Technology Group Ltd.
physical id: 0
bus info: pci at 0000:02:00.0
version: 11
width: 32 bits
clock: 33MHz
capabilities: storage pm msi pciexpress ahci_1.0 bus_master
cap_list rom
configuration: driver=ahci latency=0
resources: irq:27 ioport:9c00(size=8) ioport:9800(size=4)
ioport:9400(size=8) ioport:9000(size=4) ioport:8c00(size=32)
memory:fdeff000-fdeff7ff memory:fdee0000-fdeeffff
*-storage
description: SATA controller
product: JMB363 SATA/IDE Controller
vendor: JMicron Technology Corp.
physical id: 0
bus info: pci at 0000:03:00.0
version: 03
width: 32 bits
clock: 33MHz
capabilities: storage pm pciexpress ahci_1.0 bus_master cap_list rom
configuration: driver=ahci latency=0
resources: irq:16 memory:fddfe000-fddfffff memory:fdde0000-fddeffff
*-ide
description: IDE interface
product: JMB363 SATA/IDE Controller
vendor: JMicron Technology Corp.
physical id: 0.1
bus info: pci at 0000:03:00.1
version: 03
width: 32 bits
clock: 33MHz
capabilities: ide pm bus_master cap_list
configuration: driver=pata_jmicron latency=0
resources: irq:16 ioport:7c00(size=8) ioport:7800(size=4)
ioport:7400(size=8) ioport:7000(size=4) ioport:6c00(size=16)
*-storage
description: SATA controller
product: JMB363 SATA/IDE Controller
vendor: JMicron Technology Corp.
physical id: 0
bus info: pci at 0000:04:00.0
version: 03
width: 32 bits
clock: 33MHz
capabilities: storage pm pciexpress ahci_1.0 bus_master cap_list rom
configuration: driver=ahci latency=0
resources: irq:16 memory:fdcfe000-fdcfffff memory:fdce0000-fdceffff
*-ide
description: IDE interface
product: JMB363 SATA/IDE Controller
vendor: JMicron Technology Corp.
physical id: 0.1
bus info: pci at 0000:04:00.1
version: 03
width: 32 bits
clock: 33MHz
capabilities: ide pm bus_master cap_list
configuration: driver=pata_jmicron latency=0
resources: irq:16 ioport:5c00(size=8) ioport:5800(size=4)
ioport:5400(size=8) ioport:5000(size=4) ioport:4c00(size=16)
*-scsi
physical id: 1
bus info: scsi at 20
logical name: scsi20
capabilities: scsi-host
configuration: driver=usb-storage
=======================================================================================================
/dev/sdf is connected to ata17 on the 03.00 controller has the problems
/dev/sdg connected to ata18 on the same controller is fine
/dev/sdf exibits a long delay when getting the report from smartctl -a
and frequent interurpted smart tests
i will try a new cable later and report back.
thanks
shadrock
More information about the arch-general
mailing list