On 31.05.21 20:27, Morten Bo Johansen via arch-general wrote:
I include it below. It says "Device Error Count: 12". Oddly enough, the command sudo smartctl -t long /dev/sda completes without any errors at all. The problem is also intermittent. It is rather odd.
This seems odd indeed, I am not sure however what an extended/long smart self test exactly does on an SSHD, does it include the cache-SSD as well? It is my understanding though, that the self tests do not cover all involved components of the drive to the full extent and thus a successful long selftest does not necessarily proof the drive is healthy.
--------------- output from sudo smartctl -x /dev/sda ----------------- ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 118 099 006 - 192553232 3 Spin_Up_Time PO---- 099 098 000 - 0 4 Start_Stop_Count -O--CK 098 098 020 - 2850 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 7 Seek_Error_Rate POSR-- 082 060 030 - 4492926169 9 Power_On_Hours -O--CK 078 078 000 - 19702 10 Spin_Retry_Count PO--C- 100 100 097 - 0 12 Power_Cycle_Count -O--CK 097 097 020 - 3079 184 End-to-End_Error -O--CK 077 077 099 NOW 23
This indicates a drive failure, depending on what kind of End-to-End errors are logged it might (rarely) be just a defective cable as was already suggested.
187 Reported_Uncorrect -O--CK 099 099 000 - 1 188 Command_Timeout -O--CK 100 099 000 - 12 189 High_Fly_Writes -O-RCK 067 067 000 - 33 190 Airflow_Temperature_Cel -O---K 051 042 045 Past 49 (Min/Max 28/55 #251)
This indicates the drive is running rather hot, I suppose it is enclosed in a laptop with poor ventilation and there is not much you can do about it? However, if you can, a bit more airflow would be appreciated by the drive. But in case you are going to replace it with an SSD (in contrast to the current SSHD), these tend to create less prolonged heat and should run cooler.
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0 192 Power-Off_Retract_Count -O--CK 100 100 000 - 169 193 Load_Cycle_Count -O--CK 099 099 000 - 2892 194 Temperature_Celsius -O---K 049 058 000 - 49 (0 9 0 0 0) 197 Current_Pending_Sector -O--C- 100 100 000 - 0 198 Offline_Uncorrectable ----C- 100 100 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 254 Free_Fall_Sensor -O--CK 100 100 000 - 0
Error 12 [11] occurred at disk power-on lifetime: 19675 hours (819 days + 19 hours) When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 00 03 51 48 00 00 Error: UNC at LBA = 0x00035148 = 217416
Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 00 00 90 00 00 00 03 50 e8 40 00 00:00:41.749 READ FPDMA QUEUED 61 00 00 00 08 00 00 1c 44 0f 38 40 00 00:00:41.749 WRITE FPDMA QUEUED ea 00 00 00 00 00 00 00 00 00 00 a0 00 00:00:41.748 FLUSH CACHE EXT 61 00 00 00 08 00 00 00 03 57 b0 40 00 00:00:41.714 WRITE FPDMA QUEUED 61 00 00 00 08 00 00 07 00 44 28 40 00 00:00:41.714 WRITE FPDMA QUEUED
Error 11 [10] occurred at disk power-on lifetime: 19675 hours (819 days + 19 hours) When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 12 eb 60 30 00 00 Error: UNC at LBA = 0x12eb6030 = 317415472
Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 00 01 00 00 00 12 eb 5f f0 40 00 00:00:38.123 READ FPDMA QUEUED 60 00 00 00 c0 00 00 00 03 78 10 40 00 00:00:38.088 READ FPDMA QUEUED 60 00 00 00 08 00 00 12 eb 68 38 40 00 00:00:38.088 READ FPDMA QUEUED 60 00 00 00 80 00 00 00 03 6d c0 40 00 00:00:38.086 READ FPDMA QUEUED 60 00 00 00 68 00 00 00 03 61 b8 40 00 00:00:38.059 READ FPDMA QUEUED
[...]
[...]
Since these errors are not UDMA/CRC errors (which might be caused by a defective cable) and only occured very recently (error timestamp in relation to Power_On_Hours) a faulty cable seems very unlikely if you did not recently poke at it (e.g. reconnected/reseated the drive for whatever reason). In my opinion this drive is on the brink of failure and should be replaced immediately. Cheers -- Thore "foxxx0" Bödecker GPG ID: 0xEB763B4E9DB887A6 GPG FP: 051E AD6A 6155 389D 69DA 02E5 EB76 3B4E 9DB8 87A6