Re: [arch-general] blk_update_request: I/O error, dev sda

31 May 2021

      On 31.05.21 20:27, Morten Bo Johansen via arch-general wrote:
...
I include it below. It says "Device Error Count: 12".
Oddly enough, the command
   sudo smartctl -t long /dev/sda
completes without any errors at all. The problem is also
intermittent. It is rather odd.
This seems odd indeed, I am not sure however what an extended/long
smart self test exactly does on an SSHD, does it include the cache-SSD
as well?
It is my understanding though, that the self tests do not cover all
involved components of the drive to the full extent and thus a
successful long selftest does not necessarily proof the drive is
healthy.
...
--------------- output from sudo smartctl -x /dev/sda -----------------
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   118   099   006    -    192553232
  3 Spin_Up_Time            PO----   099   098   000    -    0
  4 Start_Stop_Count        -O--CK   098   098   020    -    2850
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   082   060   030    -    4492926169
  9 Power_On_Hours          -O--CK   078   078   000    -    19702
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   097   097   020    -    3079
184 End-to-End_Error        -O--CK   077   077   099    NOW  23
This indicates a drive failure, depending on what kind of End-to-End
errors are logged it might (rarely) be just a defective cable as was
already suggested.
...
187 Reported_Uncorrect      -O--CK   099   099   000    -    1
188 Command_Timeout         -O--CK   100   099   000    -    12
189 High_Fly_Writes         -O-RCK   067   067   000    -    33
190 Airflow_Temperature_Cel -O---K   051   042   045    Past 49 (Min/Max 28/55 #251)
This indicates the drive is running rather hot, I suppose it is
enclosed in a laptop with poor ventilation and there is not much you
can do about it? However, if you can, a bit more airflow would be
appreciated by the drive. But in case you are going to replace it with
an SSD (in contrast to the current SSHD), these tend to create less
prolonged heat and should run cooler.
...
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    169
193 Load_Cycle_Count        -O--CK   099   099   000    -    2892
194 Temperature_Celsius     -O---K   049   058   000    -    49 (0 9 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
254 Free_Fall_Sensor        -O--CK   100   100   000    -    0
Error 12 [11] occurred at disk power-on lifetime: 19675 hours (819 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 03 51 48 00 00  Error: UNC at LBA = 0x00035148 = 217416
Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 00 90 00 00 00 03 50 e8 40 00     00:00:41.749  READ FPDMA QUEUED
  61 00 00 00 08 00 00 1c 44 0f 38 40 00     00:00:41.749  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 00 00 00 00 00 a0 00     00:00:41.748  FLUSH CACHE EXT
  61 00 00 00 08 00 00 00 03 57 b0 40 00     00:00:41.714  WRITE FPDMA QUEUED
  61 00 00 00 08 00 00 07 00 44 28 40 00     00:00:41.714  WRITE FPDMA QUEUED
Error 11 [10] occurred at disk power-on lifetime: 19675 hours (819 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 12 eb 60 30 00 00  Error: UNC at LBA = 0x12eb6030 = 317415472
Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 01 00 00 00 12 eb 5f f0 40 00     00:00:38.123  READ FPDMA QUEUED
  60 00 00 00 c0 00 00 00 03 78 10 40 00     00:00:38.088  READ FPDMA QUEUED
  60 00 00 00 08 00 00 12 eb 68 38 40 00     00:00:38.088  READ FPDMA QUEUED
  60 00 00 00 80 00 00 00 03 6d c0 40 00     00:00:38.086  READ FPDMA QUEUED
  60 00 00 00 68 00 00 00 03 61 b8 40 00     00:00:38.059  READ FPDMA QUEUED
[...]
[...]
Since these errors are not UDMA/CRC errors (which might be caused by a
defective cable) and only occured very recently (error timestamp in
relation to Power_On_Hours) a faulty cable seems very unlikely if you
did not recently poke at it (e.g. reconnected/reseated the drive for
whatever reason).

In my opinion this drive is on the brink of failure and should be
replaced immediately.

Cheers

-- 
Thore "foxxx0" Bödecker

GPG ID: 0xEB763B4E9DB887A6
GPG FP: 051E AD6A 6155 389D 69DA  02E5 EB76 3B4E 9DB8 87A6

Re: [arch-general] blk_update_request: I/O error, dev sda

Thore Bödecker