De ce este utilă redundața datelor

In mai 2008 am montat pe mașina care găzduiește acest site (și nu numai) două hard disc-uri Western Digital de 250GB, SATA 2, configurate să funcționeze în RAID 1 (mirror).
După o funcționare  aproape continuă de 30.986 ore ( mai bine de 3 ani), unul din ele este pe patul de moarte.
Primul simptom a fost dispariția din matricea Raid realizată software și trecerea acesteia în starea „Degradat”.

Adevarul este că ambele disc-uri au rulat la temperaturi destul de mari, masina fiind amplasată într-un mediu fără climatizare. Mi-au rămas așadar datele importante pe un singur disc în acest moment, lucru care mă deranjează. Trebuie să achiziționez în viitorul apropiat două discuri SATA 2 pentru a avea redundanța datelor.

În acest moment smartctl indică:

 

Model Family:     Western Digital Caviar SE16 Serial ATA family
Device Model:     WDC WD2500KS-00MJB0
Serial Number:    WD-WCANKM102446
Firmware Version: 02.01C03
User Capacity:    250,059,350,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Dec  4 18:55:47 2011 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            235   188   021    Pre-fail  Always       -       3233
  4 Start_Stop_Count        100   100   000    Old_age   Always       -       127
  5 Reallocated_Sector_Ct   104   104   140    Pre-fail  Always   FAILING_NOW 763
  7 Seek_Error_Rate         200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          058   058   000    Old_age   Always       -       30986
 10 Spin_Retry_Count        100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       100   100   000    Old_age   Always       -       127
190 Airflow_Temperature_Cel 063   037   045    Old_age   Always   In_the_past 37
194 Temperature_Celsius     113   087   000    Old_age   Always       -       37
196 Reallocated_Event_Count 001   001   000    Old_age   Always       -       15166
197 Current_Pending_Sector  200   182   000    Old_age   Always       -       1
198 Offline_Uncorrectable   200   182   000    Old_age   Offline      -       3
199 UDMA_CRC_Error_Count    200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   160   001   051    Pre-fail  Offline  In_the_past 1351

SMART Error Log Version: 1
ATA Error Count: 64 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 64 occurred at disk power-on lifetime: 30727 hours (1280 days + 7 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 00 00 00 e0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 01 00 00 00 00 00   3d+02:05:36.920  IDENTIFY DEVICE
  c8 00 01 00 00 00 00 00   3d+02:01:23.505  READ DMA
  c8 00 01 80 00 00 00 00   3d+02:01:23.504  READ DMA
  c8 00 01 10 00 00 00 00   3d+02:01:23.502  READ DMA
  c8 00 01 02 00 00 00 00   3d+02:01:23.501  READ DMA

Error 63 occurred at disk power-on lifetime: 30727 hours (1280 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 01 00 00 00 e0  Device Fault; Error: ABRT 1 sectors at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 01 00 00 00 00 00   3d+02:01:23.505  READ DMA
  c8 00 01 80 00 00 00 00   3d+02:01:23.504  READ DMA
  c8 00 01 10 00 00 00 00   3d+02:01:23.502  READ DMA
  c8 00 01 02 00 00 00 00   3d+02:01:23.501  READ DMA
  c8 00 01 00 00 00 00 00   3d+02:01:23.500  READ DMA
Error 62 occurred at disk power-on lifetime: 30727 hours (1280 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 01 80 00 00 e0  Device Fault; Error: ABRT 1 sectors at LBA = 0x00000080 = 128

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 01 80 00 00 00 00   3d+02:01:23.504  READ DMA
  c8 00 01 10 00 00 00 00   3d+02:01:23.502  READ DMA
  c8 00 01 02 00 00 00 00   3d+02:01:23.501  READ DMA
  c8 00 01 00 00 00 00 00   3d+02:01:23.500  READ DMA
  c8 00 01 40 00 00 00 00   3d+02:01:23.498  READ DMA

Error 61 occurred at disk power-on lifetime: 30727 hours (1280 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 01 10 00 00 e0  Device Fault; Error: ABRT 1 sectors at LBA = 0x00000010 = 16

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 01 10 00 00 00 00   3d+02:01:23.502  READ DMA
  c8 00 01 02 00 00 00 00   3d+02:01:23.501  READ DMA
  c8 00 01 00 00 00 00 00   3d+02:01:23.500  READ DMA
  c8 00 01 40 00 00 00 00   3d+02:01:23.498  READ DMA
  c8 00 10 00 02 00 00 00   3d+02:01:23.497  READ DMA

Error 60 occurred at disk power-on lifetime: 30727 hours (1280 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 01 02 00 00 e0  Device Fault; Error: ABRT 1 sectors at LBA = 0x00000002 = 2

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 01 02 00 00 00 00   3d+02:01:23.501  READ DMA
  c8 00 01 00 00 00 00 00   3d+02:01:23.500  READ DMA
  c8 00 01 40 00 00 00 00   3d+02:01:23.498  READ DMA
  c8 00 10 00 02 00 00 00   3d+02:01:23.497  READ DMA
  c8 00 10 00 00 00 00 00   3d+02:01:23.495  READ DMA
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Articole asemanatoare:

Bogdan Turcanu

Bogdan Turcanu

One thought on “De ce este utilă redundața datelor

  1. te-a prins cam rau acum…s-a dublat pretul la hdd-uri…din cauza la nush ce inundatii prin Thailanda…al doilea producator de hard-uri dupa China.
    La SSD-uri, insa, preturile au ramas la fel dar la capacitatea de care ai nevoie s-ar putea sa fie prea costisitoare solutia asta…

Lasă un răspuns

Adresa ta de email nu va fi publicată. Câmpurile obligatorii sunt marcate cu *

Acest site folosește Akismet pentru a reduce spamul. Află cum sunt procesate datele comentariilor tale.