[arch-general] blk_update_request: I/O error, dev sda
Thore Bödecker
foxxx0 at archlinux.org
Mon May 31 19:37:04 UTC 2021
On 31.05.21 20:27, Morten Bo Johansen via arch-general wrote:
> I include it below. It says "Device Error Count: 12".
> Oddly enough, the command
> sudo smartctl -t long /dev/sda
> completes without any errors at all. The problem is also
> intermittent. It is rather odd.
This seems odd indeed, I am not sure however what an extended/long
smart self test exactly does on an SSHD, does it include the cache-SSD
as well?
It is my understanding though, that the self tests do not cover all
involved components of the drive to the full extent and thus a
successful long selftest does not necessarily proof the drive is
healthy.
> --------------- output from sudo smartctl -x /dev/sda -----------------
> ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
> 1 Raw_Read_Error_Rate POSR-- 118 099 006 - 192553232
> 3 Spin_Up_Time PO---- 099 098 000 - 0
> 4 Start_Stop_Count -O--CK 098 098 020 - 2850
> 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0
> 7 Seek_Error_Rate POSR-- 082 060 030 - 4492926169
> 9 Power_On_Hours -O--CK 078 078 000 - 19702
> 10 Spin_Retry_Count PO--C- 100 100 097 - 0
> 12 Power_Cycle_Count -O--CK 097 097 020 - 3079
> 184 End-to-End_Error -O--CK 077 077 099 NOW 23
This indicates a drive failure, depending on what kind of End-to-End
errors are logged it might (rarely) be just a defective cable as was
already suggested.
> 187 Reported_Uncorrect -O--CK 099 099 000 - 1
> 188 Command_Timeout -O--CK 100 099 000 - 12
> 189 High_Fly_Writes -O-RCK 067 067 000 - 33
> 190 Airflow_Temperature_Cel -O---K 051 042 045 Past 49 (Min/Max 28/55 #251)
This indicates the drive is running rather hot, I suppose it is
enclosed in a laptop with poor ventilation and there is not much you
can do about it? However, if you can, a bit more airflow would be
appreciated by the drive. But in case you are going to replace it with
an SSD (in contrast to the current SSHD), these tend to create less
prolonged heat and should run cooler.
> 191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
> 192 Power-Off_Retract_Count -O--CK 100 100 000 - 169
> 193 Load_Cycle_Count -O--CK 099 099 000 - 2892
> 194 Temperature_Celsius -O---K 049 058 000 - 49 (0 9 0 0 0)
> 197 Current_Pending_Sector -O--C- 100 100 000 - 0
> 198 Offline_Uncorrectable ----C- 100 100 000 - 0
> 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
> 254 Free_Fall_Sensor -O--CK 100 100 000 - 0
>
>
> Error 12 [11] occurred at disk power-on lifetime: 19675 hours (819 days + 19 hours)
> When the command that caused the error occurred, the device was active or idle.
>
> After command completion occurred, registers were:
> ER -- ST COUNT LBA_48 LH LM LL DV DC
> -- -- -- == -- == == == -- -- -- -- --
> 40 -- 51 00 00 00 00 00 03 51 48 00 00 Error: UNC at LBA = 0x00035148 = 217416
>
> Commands leading to the command that caused the error were:
> CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
> -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
> 60 00 00 00 90 00 00 00 03 50 e8 40 00 00:00:41.749 READ FPDMA QUEUED
> 61 00 00 00 08 00 00 1c 44 0f 38 40 00 00:00:41.749 WRITE FPDMA QUEUED
> ea 00 00 00 00 00 00 00 00 00 00 a0 00 00:00:41.748 FLUSH CACHE EXT
> 61 00 00 00 08 00 00 00 03 57 b0 40 00 00:00:41.714 WRITE FPDMA QUEUED
> 61 00 00 00 08 00 00 07 00 44 28 40 00 00:00:41.714 WRITE FPDMA QUEUED
>
> Error 11 [10] occurred at disk power-on lifetime: 19675 hours (819 days + 19 hours)
> When the command that caused the error occurred, the device was active or idle.
>
> After command completion occurred, registers were:
> ER -- ST COUNT LBA_48 LH LM LL DV DC
> -- -- -- == -- == == == -- -- -- -- --
> 40 -- 51 00 00 00 00 12 eb 60 30 00 00 Error: UNC at LBA = 0x12eb6030 = 317415472
>
> Commands leading to the command that caused the error were:
> CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
> -- == -- == -- == == == -- -- -- -- -- --------------- --------------------
> 60 00 00 01 00 00 00 12 eb 5f f0 40 00 00:00:38.123 READ FPDMA QUEUED
> 60 00 00 00 c0 00 00 00 03 78 10 40 00 00:00:38.088 READ FPDMA QUEUED
> 60 00 00 00 08 00 00 12 eb 68 38 40 00 00:00:38.088 READ FPDMA QUEUED
> 60 00 00 00 80 00 00 00 03 6d c0 40 00 00:00:38.086 READ FPDMA QUEUED
> 60 00 00 00 68 00 00 00 03 61 b8 40 00 00:00:38.059 READ FPDMA QUEUED
>
> [...]
>
> [...]
>
Since these errors are not UDMA/CRC errors (which might be caused by a
defective cable) and only occured very recently (error timestamp in
relation to Power_On_Hours) a faulty cable seems very unlikely if you
did not recently poke at it (e.g. reconnected/reseated the drive for
whatever reason).
In my opinion this drive is on the brink of failure and should be
replaced immediately.
Cheers
--
Thore "foxxx0" Bödecker
GPG ID: 0xEB763B4E9DB887A6
GPG FP: 051E AD6A 6155 389D 69DA 02E5 EB76 3B4E 9DB8 87A6
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.archlinux.org/pipermail/arch-general/attachments/20210531/e5f0f981/attachment.sig>
More information about the arch-general
mailing list