411 lines
16 KiB
Markdown
411 lines
16 KiB
Markdown
---
|
|
categories: hardware storage
|
|
title: Howto SMART
|
|
...
|
|
|
|
Documentation : <https://www.smartmontools.org/wiki/TocDoc>
|
|
|
|
[SMART](https://fr.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology) (Self-Monitoring, Analysis and Reporting Technology) est intégrée à la plupart des disques durs pour avoir des indicateurs de diagnostic. Sous Linux/Unix, [Smartmontools](https://www.smartmontools.org/) est l'outil pour exploiter la technologie SMART, notamment avec la commande `smartctl` et le démon `smartd`.
|
|
|
|
|
|
## Installation
|
|
|
|
~~~
|
|
# apt install smartmontools
|
|
|
|
$ /usr/sbin/smartctl -V
|
|
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-4-amd64] (local build)
|
|
[...]
|
|
smartmontools release 6.6 dated 2016-05-07 at 11:17:46 UTC
|
|
smartmontools SVN rev 4324 dated 2016-05-31 at 20:45:50
|
|
smartmontools build host: x86_64-pc-linux-gnu
|
|
smartmontools build with: C++98, GCC 5.4.0 20160609
|
|
[...]
|
|
|
|
# systemctl status smartd
|
|
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
|
|
Loaded: loaded (/lib/systemd/system/smartd.service; enabled; vendor preset: enabled)
|
|
Docs: man:smartd(8)
|
|
man:smartd.conf(5)
|
|
~~~
|
|
|
|
## Utilisation basique
|
|
|
|
Quelques exemples de commande de base :
|
|
|
|
~~~
|
|
# smartctl --scan
|
|
# smartctl -a /dev/sda
|
|
# smartctl -a /dev/sda | egrep 'Serial|Error'
|
|
# smartctl -a /dev/sda | grep Power_On_Hours
|
|
# smartctl -a /dev/sda | grep Power_Cycle_Count
|
|
# smartctl -a /dev/sda -d megaraid,0
|
|
# smartctl -i /dev/sg0
|
|
~~~
|
|
|
|
|
|
## smartctl
|
|
|
|
On peut s'assurer que toutes les fonctionnalités SMART sont activées sur un disque via :
|
|
|
|
~~~
|
|
# smartctl -s on -o on -S on /dev/sda
|
|
~~~
|
|
|
|
### Lister les disques
|
|
|
|
Sur une machine avec un seul disque :
|
|
|
|
~~~
|
|
# smartctl --scan
|
|
|
|
/dev/sda -d scsi # /dev/sda, SCSI device
|
|
~~~
|
|
|
|
Sur une machine avec du RAID hardware :
|
|
|
|
~~~
|
|
# smartctl --scan
|
|
|
|
/dev/hdd -d ata # /dev/hdd, ATA device
|
|
/dev/sda -d scsi # /dev/sda, SCSI device
|
|
/dev/sdb -d scsi # /dev/sdb, SCSI device
|
|
/dev/sdc -d scsi # /dev/sdc, SCSI device
|
|
/dev/bus/0 -d megaraid,0 # /dev/bus/0 [megaraid_disk_00], SCSI device
|
|
/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
|
|
/dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device
|
|
/dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device
|
|
/dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device
|
|
/dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device
|
|
/dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device
|
|
~~~
|
|
|
|
### Voir les informations d'un disque
|
|
|
|
L'option `-i` permet d'afficher les informations sur un disque :
|
|
|
|
~~~
|
|
# smartctl -i /dev/sda
|
|
|
|
=== START OF INFORMATION SECTION ===
|
|
Model Family: Seagate Laptop Thin HDD
|
|
Device Model: ST500LM021-1KJ152
|
|
Serial Number: XXXXXXXX
|
|
LU WWN Device Id: 5 000c50 09cbac333
|
|
Firmware Version: 0005SDM1
|
|
User Capacity: 500,107,862,016 bytes [500 GB]
|
|
Sector Sizes: 512 bytes logical, 4096 bytes physical
|
|
Rotation Rate: 7200 rpm
|
|
Form Factor: 2.5 inches
|
|
Device is: In smartctl database [for details use: -P show]
|
|
ATA Version is: ATA8-ACS T13/1699-D revision 4
|
|
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
|
|
Local Time is: Tue Nov 28 16:19:49 2017 CET
|
|
SMART support is: Available - device has SMART capability.
|
|
SMART support is: Enabled
|
|
~~~
|
|
|
|
L'option `-l error` permet d'afficher les éventuelles erreurs d'un disque :
|
|
|
|
~~~
|
|
# smartctl -l error /dev/sda
|
|
|
|
=== START OF SMART DATA SECTION ===
|
|
Error Information (NVMe Log 0x01, max 64 entries)
|
|
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
|
|
0 120 0 0x0008 0x4004 - 0 0 -
|
|
1 119 0 0x0018 0x4004 0x02c 0 0 -
|
|
2 118 0 0x0017 0x4004 0x02c 0 0 -
|
|
3 117 0 0x0008 0x4004 - 0 0 -
|
|
4 116 0 0x0018 0x4004 0x02c 0 0 -
|
|
5 115 0 0x0017 0x4004 0x02c 0 0 -
|
|
6 114 0 0x0008 0x4004 - 0 0 -
|
|
7 113 0 0x0018 0x4004 0x02c 0 0 -
|
|
8 112 0 0x0017 0x4004 0x02c 0 0 -
|
|
9 111 0 0x0008 0x4004 - 0 0 -
|
|
10 110 0 0x0008 0x4004 - 0 0 -
|
|
11 109 0 0x0008 0x4004 0x02c 0 0 -
|
|
12 108 0 0x0008 0x4004 0x02c 0 0 -
|
|
13 107 0 0x0018 0x4004 0x02c 0 0 -
|
|
14 106 0 0x0017 0x4004 0x02c 0 0 -
|
|
15 105 0 0x0008 0x4004 0x02c 0 0 -
|
|
... (48 entries not shown)
|
|
~~~
|
|
|
|
L'option `-a` permet d'afficher toutes les informations SMART :
|
|
|
|
~~~
|
|
# smartctl -a /dev/sda
|
|
|
|
=== START OF INFORMATION SECTION ===
|
|
Model Number: SAMSUNG MZVLW256HEHP-000L7
|
|
Serial Number: XXXXXXXX
|
|
Firmware Version: 4L7QCXB7
|
|
PCI Vendor/Subsystem ID: 0x144d
|
|
IEEE OUI Identifier: 0x002538
|
|
Total NVM Capacity: 256 060 514 304 [256 GB]
|
|
Unallocated NVM Capacity: 0
|
|
Controller ID: 2
|
|
Number of Namespaces: 1
|
|
Namespace 1 Size/Capacity: 256 060 514 304 [256 GB]
|
|
Namespace 1 Utilization: 208 604 237 824 [208 GB]
|
|
Namespace 1 Formatted LBA Size: 512
|
|
Local Time is: Mon Dec 4 00:16:33 2017 CET
|
|
Firmware Updates (0x16): 3 Slots, no Reset required
|
|
Optional Admin Commands (0x0017): Security Format Frmw_DL *Other*
|
|
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
|
|
Warning Comp. Temp. Threshold: 69 Celsius
|
|
Critical Comp. Temp. Threshold: 72 Celsius
|
|
|
|
Supported Power States
|
|
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
|
|
0 + 7.60W - - 0 0 0 0 0 0
|
|
1 + 6.00W - - 1 1 1 1 0 0
|
|
2 + 5.10W - - 2 2 2 2 0 0
|
|
3 - 0.0400W - - 3 3 3 3 210 1500
|
|
4 - 0.0050W - - 4 4 4 4 2200 6000
|
|
|
|
Supported LBA Sizes (NSID 0x1)
|
|
Id Fmt Data Metadt Rel_Perf
|
|
0 + 512 0 0
|
|
|
|
=== START OF SMART DATA SECTION ===
|
|
SMART overall-health self-assessment test result: PASSED
|
|
|
|
General SMART Values:
|
|
Offline data collection status: (0x00) Offline data collection activity
|
|
was never started.
|
|
Auto Offline Data Collection: Disabled.
|
|
Self-test execution status: ( 23) The self-test routine was aborted by
|
|
the host.
|
|
Total time to complete Offline
|
|
data collection: ( 1) seconds.
|
|
Offline data collection
|
|
capabilities: (0x75) SMART execute Offline immediate.
|
|
No Auto Offline data collection support.
|
|
Abort Offline collection upon new
|
|
command.
|
|
No Offline surface scan supported.
|
|
Self-test supported.
|
|
Conveyance Self-test supported.
|
|
Selective Self-test supported.
|
|
SMART capabilities: (0x0003) Saves SMART data before entering
|
|
power-saving mode.
|
|
Supports SMART auto save timer.
|
|
Error logging capability: (0x01) Error logging supported.
|
|
General Purpose Logging supported.
|
|
Short self-test routine
|
|
recommended polling time: ( 1) minutes.
|
|
Extended self-test routine
|
|
recommended polling time: ( 1) minutes.
|
|
Conveyance self-test routine
|
|
recommended polling time: ( 1) minutes.
|
|
SCT capabilities: (0x003d) SCT Status supported.
|
|
SCT Error Recovery Control supported.
|
|
SCT Feature Control supported.
|
|
SCT Data Table supported.
|
|
|
|
SMART Attributes Data Structure revision number: 5
|
|
Vendor Specific SMART Attributes with Thresholds:
|
|
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
|
|
3 Spin_Up_Time 0x0020 100 100 000 Old_age Offline - 0
|
|
4 Start_Stop_Count 0x0030 100 100 000 Old_age Offline - 0
|
|
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
|
|
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 49872
|
|
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 15
|
|
170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0
|
|
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
|
|
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
|
|
183 Runtime_Bad_Block 0x0030 100 100 000 Old_age Offline - 0
|
|
184 End-to-End_Error 0x0032 100 100 090 Old_age Always - 0
|
|
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
|
|
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 13
|
|
199 UDMA_CRC_Error_Count 0x0030 100 100 000 Old_age Offline - 5
|
|
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 575610
|
|
226 Load-in_Time 0x0032 100 100 000 Old_age Always - 18829
|
|
227 Torq-amp_Count 0x0032 100 100 000 Old_age Always - 0
|
|
228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 2992332
|
|
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
|
|
233 Media_Wearout_Indicator 0x0032 082 082 000 Old_age Always - 0
|
|
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 575610
|
|
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 581199
|
|
|
|
SMART Error Log Version: 1
|
|
No Errors Logged
|
|
|
|
SMART Self-test log structure revision number 1
|
|
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
|
|
# 1 Short captive Completed without error 10% 49872 -
|
|
# 2 Extended offline Completed without error 00% 49872 -
|
|
# 3 Reserved (0x20) Completed without error 00% 49872 -
|
|
# 4 Reserved (0x20) Completed without error 10% 14 -
|
|
# 5 Reserved (0x20) Completed without error 10% 4 -
|
|
# 6 Reserved (0x20) Completed without error 10% 4 -
|
|
# 7 Vendor (0x58) Completed without error 10% 4 -
|
|
|
|
Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
|
|
SMART Selective self-test log data structure revision number 0
|
|
Note: revision number not 1 implies that no selective self-test has ever been run
|
|
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
|
|
1 0 0 Not_testing
|
|
2 0 0 Not_testing
|
|
3 0 0 Not_testing
|
|
4 0 0 Not_testing
|
|
5 0 0 Not_testing
|
|
Selective self-test flags (0x0):
|
|
After scanning selected spans, do NOT read-scan remainder of disk.
|
|
If Selective self-test is pending on power-up, resume after 0 minute delay.
|
|
~~~
|
|
|
|
Si votre disque n'est pas un disque physique mais un volume d'un RAID matériel, il faut préciser le type et le numéro du disque physique voulu :
|
|
|
|
~~~
|
|
# smartctl -i /dev/sda -d megaraid,0
|
|
|
|
=== START OF INFORMATION SECTION ===
|
|
Device Model: SSDSC2BB480G7R
|
|
Serial Number: XXXXXXXXXXXXXXXXXX
|
|
LU WWN Device Id: 5 5cd2e4 14d52d0aa
|
|
Add. Product Id: DELL(tm)
|
|
Firmware Version: N201DL41
|
|
User Capacity: 480,103,981,056 bytes [480 GB]
|
|
Sector Sizes: 512 bytes logical, 4096 bytes physical
|
|
Rotation Rate: Solid State Device
|
|
Form Factor: 2.5 inches
|
|
Device is: Not in smartctl database [for details use: -P showall]
|
|
ATA Version is: ACS-3 T13/2161-D revision 5
|
|
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
|
|
Local Time is: Tue Nov 28 16:27:57 2017 CET
|
|
SMART support is: Available - device has SMART capability.
|
|
SMART support is: Enabled
|
|
~~~
|
|
|
|
Dans certains cas, le contrôleur RAID dispose d'une possibilité de voir le disque au travers d'un module SCSI générique.
|
|
|
|
~~~
|
|
# modprobe sg
|
|
|
|
# smartctl -i /dev/sg0
|
|
|
|
=== START OF INFORMATION SECTION ===
|
|
Model Family: Toshiba 3.5" MG03ACAxxx(Y) Enterprise HDD
|
|
Device Model: TOSHIBA MG03ACA100
|
|
Serial Number: XXXXX
|
|
LU WWN Device Id: 5 000039 4eb981078
|
|
Add. Product Id: DELL(tm)
|
|
Firmware Version: FL1D
|
|
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
|
|
Sector Size: 512 bytes logical/physical
|
|
Rotation Rate: 7200 rpm
|
|
Form Factor: 3.5 inches
|
|
Device is: In smartctl database [for details use: -P show]
|
|
ATA Version is: ATA8-ACS (minor revision not indicated)
|
|
SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
|
|
Local Time is: Fri Dec 1 11:57:19 2017 CET
|
|
SMART support is: Available - device has SMART capability.
|
|
SMART support is: Enabled
|
|
~~~
|
|
|
|
### Tester un disque
|
|
|
|
On peut lancer un test rapide d'un disque :
|
|
|
|
~~~
|
|
# smartctl -t short /dev/sda
|
|
|
|
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
|
|
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
|
|
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
|
|
Testing has begun.
|
|
Please wait 1 minutes for test to complete.
|
|
Test will complete after Thu Dec 7 02:51:10 2017
|
|
~~~
|
|
|
|
On peut visualiser les résultats du test avec :
|
|
|
|
~~~
|
|
# smartctl -l selftest /dev/sda
|
|
|
|
=== START OF READ SMART DATA SECTION ===
|
|
SMART Self-test log structure revision number 1
|
|
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
|
|
# 1 Extended offline Completed without error 00% 49872 -
|
|
# 2 Reserved (0x20) Completed without error 00% 49872 -
|
|
# 3 Reserved (0x20) Completed without error 10% 14 -
|
|
# 4 Reserved (0x20) Completed without error 10% 4 -
|
|
# 5 Reserved (0x20) Completed without error 10% 4 -
|
|
# 6 Vendor (0x58) Completed without error 10% 4 -
|
|
~~~
|
|
|
|
On peut aussi lancer un test long :
|
|
|
|
~~~
|
|
# smartctl -t long /dev/sda
|
|
~~~
|
|
|
|
Si l'on veut interrompre le test en cours :
|
|
|
|
~~~
|
|
# smartctl -X /dev/sda
|
|
|
|
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
|
|
Sending command: "Abort SMART off-line mode self-test routine".
|
|
Self-testing aborted!
|
|
~~~
|
|
|
|
|
|
## smartd
|
|
|
|
On active **smartd** en listant les périphériques concernés via `/etc/default/smartmontools` :
|
|
|
|
~~~
|
|
enable_smart="/dev/sda /dev/sdb"
|
|
start_smartd=yes
|
|
smartd_opts="--interval=1800"
|
|
~~~
|
|
|
|
Puis on peut personnaliser l'adresse email de réception des alertes via `/etc/smartd.conf` :
|
|
|
|
~~~
|
|
DEVICESCAN -d removable -n standby -m monitoring@example.com -M exec /usr/share/smartmontools/smartd-runner
|
|
~~~
|
|
|
|
## FAQ
|
|
|
|
Voir <https://www.smartmontools.org/wiki/FAQ>
|
|
|
|
### Device does not support SMART
|
|
|
|
Certains disques ne supportent pas SMART. Exemple :
|
|
|
|
~~~
|
|
# smartctl -a /dev/sda
|
|
|
|
Device: ATA Maxtor 7Y250M0 Version: YAR5
|
|
Serial number: XXXXXX
|
|
Device type: disk
|
|
Local Time is: Thu Dec 7 01:59:43 2017 CET
|
|
Device does not support SMART
|
|
|
|
Error Counter logging not supported
|
|
|
|
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
|
|
Device does not support Self Test logging
|
|
~~~
|
|
|
|
### Support NVME
|
|
|
|
Par défaut SMART ne surveille pas les disques NVME car le support est jugé instable. On pourra tout de même les surveiller en ajoutant `-d nvme` dans la configuration.
|
|
|
|
~~~ { .diff }
|
|
diff --git a/smartd.conf b/smartd.conf
|
|
index 4cdede7..81619c9 100644
|
|
--- a/smartd.conf
|
|
+++ b/smartd.conf
|
|
@@ -18,7 +18,7 @@
|
|
# Directives listed below, which will be applied to all devices that
|
|
# are found. Most users should comment out DEVICESCAN and explicitly
|
|
# list the devices that they wish to monitor.
|
|
-DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
|
|
+DEVICESCAN -d removable -d nvme -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
|
|
~~~ |