--- categories: hardware storage title: Howto SMART ... Documentation : [SMART](https://fr.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology) (Self-Monitoring, Analysis and Reporting Technology) est intégrée à la plupart des disques durs pour avoir des indicateurs de diagnostic. Sous Linux/Unix, [Smartmontools](https://www.smartmontools.org/) est l'outil pour exploiter la technologie SMART, notamment avec la commande `smartctl` et le démon `smartd`. ## Installation ~~~ # apt install smartmontools $ /usr/sbin/smartctl -V smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-4-amd64] (local build) [...] smartmontools release 6.6 dated 2016-05-07 at 11:17:46 UTC smartmontools SVN rev 4324 dated 2016-05-31 at 20:45:50 smartmontools build host: x86_64-pc-linux-gnu smartmontools build with: C++98, GCC 5.4.0 20160609 [...] # systemctl status smartd ● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/lib/systemd/system/smartd.service; enabled; vendor preset: enabled) Docs: man:smartd(8) man:smartd.conf(5) ~~~ ## Utilisation basique Quelques exemples de commande de base : ~~~ # smartctl --scan # smartctl -a /dev/sda # smartctl -a /dev/sda | egrep 'Serial|Error' # smartctl -a /dev/sda | grep Power_On_Hours # smartctl -a /dev/sda | grep Power_Cycle_Count # smartctl -a /dev/sda -d megaraid,0 # smartctl -i /dev/sg0 ~~~ ## smartctl On peut s'assurer que toutes les fonctionnalités SMART sont activées sur un disque via : ~~~ # smartctl -s on -o on -S on /dev/sda ~~~ ### Lister les disques Sur une machine avec un seul disque : ~~~ # smartctl --scan /dev/sda -d scsi # /dev/sda, SCSI device ~~~ Sur une machine avec du RAID hardware : ~~~ # smartctl --scan /dev/hdd -d ata # /dev/hdd, ATA device /dev/sda -d scsi # /dev/sda, SCSI device /dev/sdb -d scsi # /dev/sdb, SCSI device /dev/sdc -d scsi # /dev/sdc, SCSI device /dev/bus/0 -d megaraid,0 # /dev/bus/0 [megaraid_disk_00], SCSI device /dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device /dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device /dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device /dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device /dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device /dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device ~~~ ### Voir les informations d'un disque L'option `-i` permet d'afficher les informations sur un disque : ~~~ # smartctl -i /dev/sda === START OF INFORMATION SECTION === Model Family: Seagate Laptop Thin HDD Device Model: ST500LM021-1KJ152 Serial Number: XXXXXXXX LU WWN Device Id: 5 000c50 09cbac333 Firmware Version: 0005SDM1 User Capacity: 500,107,862,016 bytes [500 GB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Nov 28 16:19:49 2017 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled ~~~ L'option `-l error` permet d'afficher les éventuelles erreurs d'un disque : ~~~ # smartctl -l error /dev/sda === START OF SMART DATA SECTION === Error Information (NVMe Log 0x01, max 64 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 120 0 0x0008 0x4004 - 0 0 - 1 119 0 0x0018 0x4004 0x02c 0 0 - 2 118 0 0x0017 0x4004 0x02c 0 0 - 3 117 0 0x0008 0x4004 - 0 0 - 4 116 0 0x0018 0x4004 0x02c 0 0 - 5 115 0 0x0017 0x4004 0x02c 0 0 - 6 114 0 0x0008 0x4004 - 0 0 - 7 113 0 0x0018 0x4004 0x02c 0 0 - 8 112 0 0x0017 0x4004 0x02c 0 0 - 9 111 0 0x0008 0x4004 - 0 0 - 10 110 0 0x0008 0x4004 - 0 0 - 11 109 0 0x0008 0x4004 0x02c 0 0 - 12 108 0 0x0008 0x4004 0x02c 0 0 - 13 107 0 0x0018 0x4004 0x02c 0 0 - 14 106 0 0x0017 0x4004 0x02c 0 0 - 15 105 0 0x0008 0x4004 0x02c 0 0 - ... (48 entries not shown) ~~~ L'option `-a` permet d'afficher toutes les informations SMART : ~~~ # smartctl -a /dev/sda === START OF INFORMATION SECTION === Model Number: SAMSUNG MZVLW256HEHP-000L7 Serial Number: XXXXXXXX Firmware Version: 4L7QCXB7 PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 256 060 514 304 [256 GB] Unallocated NVM Capacity: 0 Controller ID: 2 Number of Namespaces: 1 Namespace 1 Size/Capacity: 256 060 514 304 [256 GB] Namespace 1 Utilization: 208 604 237 824 [208 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Mon Dec 4 00:16:33 2017 CET Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL *Other* Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Warning Comp. Temp. Threshold: 69 Celsius Critical Comp. Temp. Threshold: 72 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 7.60W - - 0 0 0 0 0 0 1 + 6.00W - - 1 1 1 1 0 0 2 + 5.10W - - 2 2 2 2 0 0 3 - 0.0400W - - 3 3 3 3 210 1500 4 - 0.0050W - - 4 4 4 4 2200 6000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 23) The self-test routine was aborted by the host. Total time to complete Offline data collection: ( 1) seconds. Offline data collection capabilities: (0x75) SMART execute Offline immediate. No Auto Offline data collection support. Abort Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 1) minutes. Conveyance self-test routine recommended polling time: ( 1) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 5 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time 0x0020 100 100 000 Old_age Offline - 0 4 Start_Stop_Count 0x0030 100 100 000 Old_age Offline - 0 5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 49872 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 15 170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0 171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 183 Runtime_Bad_Block 0x0030 100 100 000 Old_age Offline - 0 184 End-to-End_Error 0x0032 100 100 090 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 13 199 UDMA_CRC_Error_Count 0x0030 100 100 000 Old_age Offline - 5 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 575610 226 Load-in_Time 0x0032 100 100 000 Old_age Always - 18829 227 Torq-amp_Count 0x0032 100 100 000 Old_age Always - 0 228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 2992332 232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0 233 Media_Wearout_Indicator 0x0032 082 082 000 Old_age Always - 0 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 575610 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 581199 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short captive Completed without error 10% 49872 - # 2 Extended offline Completed without error 00% 49872 - # 3 Reserved (0x20) Completed without error 00% 49872 - # 4 Reserved (0x20) Completed without error 10% 14 - # 5 Reserved (0x20) Completed without error 10% 4 - # 6 Reserved (0x20) Completed without error 10% 4 - # 7 Vendor (0x58) Completed without error 10% 4 - Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ~~~ Si votre disque n'est pas un disque physique mais un volume d'un RAID matériel, il faut préciser le type et le numéro du disque physique voulu : ~~~ # smartctl -i /dev/sda -d megaraid,0 === START OF INFORMATION SECTION === Device Model: SSDSC2BB480G7R Serial Number: XXXXXXXXXXXXXXXXXX LU WWN Device Id: 5 5cd2e4 14d52d0aa Add. Product Id: DELL(tm) Firmware Version: N201DL41 User Capacity: 480,103,981,056 bytes [480 GB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Nov 28 16:27:57 2017 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled ~~~ Dans certains cas, le contrôleur RAID dispose d'une possibilité de voir le disque au travers d'un module SCSI générique. ~~~ # modprobe sg # smartctl -i /dev/sg0 === START OF INFORMATION SECTION === Model Family: Toshiba 3.5" MG03ACAxxx(Y) Enterprise HDD Device Model: TOSHIBA MG03ACA100 Serial Number: XXXXX LU WWN Device Id: 5 000039 4eb981078 Add. Product Id: DELL(tm) Firmware Version: FL1D User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Fri Dec 1 11:57:19 2017 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled ~~~ ### Tester un disque On peut lancer un test rapide d'un disque : ~~~ # smartctl -t short /dev/sda === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 1 minutes for test to complete. Test will complete after Thu Dec 7 02:51:10 2017 ~~~ On peut visualiser les résultats du test avec : ~~~ # smartctl -l selftest /dev/sda === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 49872 - # 2 Reserved (0x20) Completed without error 00% 49872 - # 3 Reserved (0x20) Completed without error 10% 14 - # 4 Reserved (0x20) Completed without error 10% 4 - # 5 Reserved (0x20) Completed without error 10% 4 - # 6 Vendor (0x58) Completed without error 10% 4 - ~~~ On peut aussi lancer un test long : ~~~ # smartctl -t long /dev/sda ~~~ Si l'on veut interrompre le test en cours : ~~~ # smartctl -X /dev/sda === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Abort SMART off-line mode self-test routine". Self-testing aborted! ~~~ ## smartd On active **smartd** en listant les périphériques concernés via `/etc/default/smartmontools` : ~~~ enable_smart="/dev/sda /dev/sdb" start_smartd=yes smartd_opts="--interval=1800" ~~~ Puis on peut personnaliser l'adresse email de réception des alertes via `/etc/smartd.conf` : ~~~ DEVICESCAN -d removable -n standby -m monitoring@example.com -M exec /usr/share/smartmontools/smartd-runner ~~~ ## FAQ Voir ### Device does not support SMART Certains disques ne supportent pas SMART. Exemple : ~~~ # smartctl -a /dev/sda Device: ATA Maxtor 7Y250M0 Version: YAR5 Serial number: XXXXXX Device type: disk Local Time is: Thu Dec 7 01:59:43 2017 CET Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging ~~~ ### Support NVME Par défaut SMART ne surveille pas les disques NVME car le support est jugé instable. On pourra tout de même les surveiller en ajoutant `-d nvme` dans la configuration. ~~~ { .diff } diff --git a/smartd.conf b/smartd.conf index 4cdede7..81619c9 100644 --- a/smartd.conf +++ b/smartd.conf @@ -18,7 +18,7 @@ # Directives listed below, which will be applied to all devices that # are found. Most users should comment out DEVICESCAN and explicitly # list the devices that they wish to monitor. -DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner +DEVICESCAN -d removable -d nvme -n standby -m root -M exec /usr/share/smartmontools/smartd-runner ~~~