diff --git a/HowtoSmart.md b/HowtoSmart.md index 24374b2e..69e7c2d5 100644 --- a/HowtoSmart.md +++ b/HowtoSmart.md @@ -29,8 +29,27 @@ smartmontools build with: C++98, GCC 5.4.0 20160609 man:smartd.conf(5) ~~~ +## Utilisation basique -## Utilisation +Quelques exemples de commande de base : + +~~~ +# smartctl --scan +# smartctl -a /dev/sda +# smartctl -a /dev/sda | grep Power_On_Hours +# smartctl -a /dev/sda | grep Power_Cycle_Count +# smartctl -a /dev/sda -d megaraid,0 +# smartctl -i /dev/sg0 +~~~ + + +## smartctl + +On peut s'assurer que toutes les fonctionnalités SMART sont activées sur un disque via : + +~~~ +# smartctl -s on -o on -S on /dev/sda +~~~ ### Lister les disques @@ -60,7 +79,7 @@ Sur une machine avec du RAID hardware : /dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device ~~~ -### Voir les informations sur un disque +### Voir les informations d'un disque L'option `-i` permet d'afficher les informations sur un disque : @@ -85,6 +104,33 @@ SMART support is: Available - device has SMART capability. SMART support is: Enabled ~~~ +L'option `-l error` permet d'afficher les éventuelles erreurs d'un disque : + +~~~ +# smartctl -l error /dev/sda + +=== START OF SMART DATA SECTION === +Error Information (NVMe Log 0x01, max 64 entries) +Num ErrCount SQId CmdId Status PELoc LBA NSID VS + 0 120 0 0x0008 0x4004 - 0 0 - + 1 119 0 0x0018 0x4004 0x02c 0 0 - + 2 118 0 0x0017 0x4004 0x02c 0 0 - + 3 117 0 0x0008 0x4004 - 0 0 - + 4 116 0 0x0018 0x4004 0x02c 0 0 - + 5 115 0 0x0017 0x4004 0x02c 0 0 - + 6 114 0 0x0008 0x4004 - 0 0 - + 7 113 0 0x0018 0x4004 0x02c 0 0 - + 8 112 0 0x0017 0x4004 0x02c 0 0 - + 9 111 0 0x0008 0x4004 - 0 0 - + 10 110 0 0x0008 0x4004 - 0 0 - + 11 109 0 0x0008 0x4004 0x02c 0 0 - + 12 108 0 0x0008 0x4004 0x02c 0 0 - + 13 107 0 0x0018 0x4004 0x02c 0 0 - + 14 106 0 0x0017 0x4004 0x02c 0 0 - + 15 105 0 0x0008 0x4004 0x02c 0 0 - +... (48 entries not shown) +~~~ + L'option `-a` permet d'afficher toutes les informations SMART : ~~~ @@ -125,37 +171,91 @@ Id Fmt Data Metadt Rel_Perf === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED -SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) -Critical Warning: 0x00 -Temperature: 32 Celsius -Available Spare: 100% -Available Spare Threshold: 10% -Percentage Used: 0% -Data Units Read: 122 540 [62,7 GB] -Data Units Written: 1 927 650 [986 GB] -Host Read Commands: 1 767 402 -Host Write Commands: 31 997 703 -Controller Busy Time: 47 -Power Cycles: 371 -Power On Hours: 748 -Unsafe Shutdowns: 53 -Media and Data Integrity Errors: 0 -Error Information Log Entries: 120 -Warning Comp. Temperature Time: 0 -Critical Comp. Temperature Time: 0 -Temperature Sensor 1: 32 Celsius -Temperature Sensor 2: 34 Celsius +General SMART Values: +Offline data collection status: (0x00) Offline data collection activity + was never started. + Auto Offline Data Collection: Disabled. +Self-test execution status: ( 23) The self-test routine was aborted by + the host. +Total time to complete Offline +data collection: ( 1) seconds. +Offline data collection +capabilities: (0x75) SMART execute Offline immediate. + No Auto Offline data collection support. + Abort Offline collection upon new + command. + No Offline surface scan supported. + Self-test supported. + Conveyance Self-test supported. + Selective Self-test supported. +SMART capabilities: (0x0003) Saves SMART data before entering + power-saving mode. + Supports SMART auto save timer. +Error logging capability: (0x01) Error logging supported. + General Purpose Logging supported. +Short self-test routine +recommended polling time: ( 1) minutes. +Extended self-test routine +recommended polling time: ( 1) minutes. +Conveyance self-test routine +recommended polling time: ( 1) minutes. +SCT capabilities: (0x003d) SCT Status supported. + SCT Error Recovery Control supported. + SCT Feature Control supported. + SCT Data Table supported. -Error Information (NVMe Log 0x01, max 64 entries) -Num ErrCount SQId CmdId Status PELoc LBA NSID VS - 0 120 0 0x0008 0x4004 - 0 0 - - 1 119 0 0x0018 0x4004 0x02c 0 0 - -[...] +SMART Attributes Data Structure revision number: 5 +Vendor Specific SMART Attributes with Thresholds: +ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE + 3 Spin_Up_Time 0x0020 100 100 000 Old_age Offline - 0 + 4 Start_Stop_Count 0x0030 100 100 000 Old_age Offline - 0 + 5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0 + 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 49872 + 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 15 +170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0 +171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 +172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 +183 Runtime_Bad_Block 0x0030 100 100 000 Old_age Offline - 0 +184 End-to-End_Error 0x0032 100 100 090 Old_age Always - 0 +187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 +192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 13 +199 UDMA_CRC_Error_Count 0x0030 100 100 000 Old_age Offline - 5 +225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 575610 +226 Load-in_Time 0x0032 100 100 000 Old_age Always - 18829 +227 Torq-amp_Count 0x0032 100 100 000 Old_age Always - 0 +228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 2992332 +232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0 +233 Media_Wearout_Indicator 0x0032 082 082 000 Old_age Always - 0 +241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 575610 +242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 581199 + +SMART Error Log Version: 1 +No Errors Logged + +SMART Self-test log structure revision number 1 +Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error +# 1 Short captive Completed without error 10% 49872 - +# 2 Extended offline Completed without error 00% 49872 - +# 3 Reserved (0x20) Completed without error 00% 49872 - +# 4 Reserved (0x20) Completed without error 10% 14 - +# 5 Reserved (0x20) Completed without error 10% 4 - +# 6 Reserved (0x20) Completed without error 10% 4 - +# 7 Vendor (0x58) Completed without error 10% 4 - + +Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run +SMART Selective self-test log data structure revision number 0 +Note: revision number not 1 implies that no selective self-test has ever been run + SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS + 1 0 0 Not_testing + 2 0 0 Not_testing + 3 0 0 Not_testing + 4 0 0 Not_testing + 5 0 0 Not_testing +Selective self-test flags (0x0): + After scanning selected spans, do NOT read-scan remainder of disk. +If Selective self-test is pending on power-up, resume after 0 minute delay. ~~~ - -### RAID matériel - Si votre disque n'est pas un disque physique mais un volume d'un RAID matériel, il faut préciser le type et le numéro du disque physique voulu : ~~~ @@ -185,8 +285,6 @@ Dans certains cas, le contrôleur RAID dispose d'une possibilité de voir le dis # modprobe sg # smartctl -i /dev/sg0 -smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-92-generic] (local build) -Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Toshiba 3.5" MG03ACAxxx(Y) Enterprise HDD @@ -207,13 +305,89 @@ SMART support is: Available - device has SMART capability. SMART support is: Enabled ~~~ +### Tester un disque + +On peut lancer un test rapide d'un disque : + +~~~ +# smartctl -t short /dev/sda + +=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === +Sending command: "Execute SMART Short self-test routine immediately in off-line mode". +Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. +Testing has begun. +Please wait 1 minutes for test to complete. +Test will complete after Thu Dec 7 02:51:10 2017 +~~~ + +On peut visualiser les résultats du test avec : + +~~~ +# smartctl -l selftest /dev/sda + +=== START OF READ SMART DATA SECTION === +SMART Self-test log structure revision number 1 +Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error +# 1 Extended offline Completed without error 00% 49872 - +# 2 Reserved (0x20) Completed without error 00% 49872 - +# 3 Reserved (0x20) Completed without error 10% 14 - +# 4 Reserved (0x20) Completed without error 10% 4 - +# 5 Reserved (0x20) Completed without error 10% 4 - +# 6 Vendor (0x58) Completed without error 10% 4 - +~~~ + +On peut aussi lancer un test long : + +~~~ +# smartctl -t long /dev/sda +~~~ + +Si l'on veut interrompre le test en cours : + +~~~ +# smartctl -X /dev/sda + +=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === +Sending command: "Abort SMART off-line mode self-test routine". +Self-testing aborted! +~~~ + + +## smartd + +On active **smartd** en listant les périphériques concernés via `/etc/default/smartmontools` : + +~~~ +enable_smart="/dev/sda /dev/sdb" +start_smartd=yes +smartd_opts="--interval=1800" +~~~ + +Puis on peut personnaliser l'adresse email de réception des alertes via `/etc/smartd.conf` : + +~~~ +DEVICESCAN -d removable -n standby -m monitoring@example.com -M exec /usr/share/smartmontools/smartd-runner +~~~ ## FAQ Voir -smartctl -s on /dev/hda %activer -smartctl -a /dev/hda %infos -smartctl -t long /dev/hda -smartctl -l error /dev/hda -gg +### Device does not support SMART + +Certains disques ne supportent pas SMART. Exemple : + +~~~ +# smartctl -a /dev/sda + +Device: ATA Maxtor 7Y250M0 Version: YAR5 +Serial number: XXXXXX +Device type: disk +Local Time is: Thu Dec 7 01:59:43 2017 CET +Device does not support SMART + +Error Counter logging not supported + +[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] +Device does not support Self Test logging +~~~