woensdag 20 mei 2015

One of my USB flash drives in my software raid has failed. A challenge to find out which one.

Af few months ago I had a brainwave. What if I would buy a few USB drives and configure them as a raid array on my Raspberry Pi2 B? Would this be stable enough to act as a secure drive to put files on? The answer is: YES.

During the first few days I've had some failed drives, that I could reflash with a vendor tool and reformat to make them work again. The last few days I've been receiving new emails that something is wrong again.

This is an automatically generated mail message from mdadm
running on serverpi

A DegradedArray event had been detected on md device /dev/md0.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] 
md0 : active raid1 sdc1[4] sda1[0] sdb1[3](F)
      15417216 blocks super 1.2 [3/2] [U_U]
      
unused devices: <none>
 It looks like /dev/sdb1 failed. So I logged in to my Pi to try to repair it.

First of all I used "sudo mdadm -D /dev/md0" to check my array.


/dev/md0:
        Version : 1.2
  Creation Time : Sun Dec 14 09:28:08 2014
     Raid Level : raid1
     Array Size : 15417216 (14.70 GiB 15.79 GB)
  Used Dev Size : 15417216 (14.70 GiB 15.79 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Wed May 20 08:49:15 2015
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0

           Name : serverpi:0  (local to host serverpi)
           UUID : 52b533f2:c23cf930:d90ba49f:0998349d
         Events : 1722237

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0        1      removed
       4       8       33        2      active sync   /dev/sdc1

       3       8       17        -      faulty spare   /dev/sdb1
And sure enough, /dev/sdb1 is the problem. It is faulty an had been taken out of the array automatically.

Now I have a challenge. I want to take this drive out, and reflash it again with the vendor tool, just to be safe. But which one is it??? They all look so similar ;-)


Luckily there is a way to find out.
The raspberry Pi internally uses a USB HUB to connect the on board ethernet adapter as well as the four the external ports you see in the picture.

If you give the command "lsusb -t" so can see how everything is populated on my (or your) Pi.
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=dwc_otg/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/5p, 12M
        |__ Port 1: Dev 3, If 0, Class=vend., Driver=smsc95xx, 12M
        |__ Port 2: Dev 4, If 0, Class=stor., Driver=usb-storage, 12M
        |__ Port 3: Dev 5, If 0, Class=stor., Driver=usb-storage, 12M
        |__ Port 5: Dev 6, If 0, Class=stor., Driver=usb-storage, 12M
OK, I've got flash drives on BUS 01 - Port 2,3&5. And nothing on Port4. So we have already learned that the top right port in the picture is Port4.

And from previous trails I've found out that port 2&3 and 4&5 are on the same dual USB socket.

 
So:
Top left Port2
Bottom left Port3
Top right Port4
Bottom right Port5

With the command "sudo lshw" you can find out what drives are connected to what USB-ports.
     *-scsi:0
          physical id: 5
          bus info: usb@1:1.2
          logical name: scsi0
          capabilities: emulated
        *-disk
             description: SCSI Disk
             physical id: 0.0.0
             bus info: scsi@0:0.0.0
             logical name: /dev/sda
             size: 14GiB (15GB)
             capabilities: partitioned partitioned:dos
             configuration: sectorsize=512 signature=c3072e18
           *-volume
                description: Windows FAT volume
                vendor: MSDOS5.0
                physical id: 1
                bus info: scsi@0:0.0.0,1
                logical name: /dev/sda1
                version: FAT32
                serial: 5663-89a9
                size: 14GiB
                capacity: 14GiB
                capabilities: primary fat initialized
                configuration: FATs=2 filesystem=fat
     *-scsi:1
          physical id: 6
          bus info: usb@1:1.3
          logical name: scsi1
          capabilities: emulated
        *-disk
             description: SCSI Disk
             physical id: 0.0.0
             bus info: scsi@1:0.0.0
             logical name: /dev/sdb
             size: 14GiB (15GB)
             capabilities: partitioned partitioned:dos
             configuration: sectorsize=512 signature=c3072e18
           *-volume UNCLAIMED
                description: Linux filesystem partition
                physical id: 1
                bus info: scsi@1:0.0.0,1
                capacity: 14GiB
                capabilities: primary
     *-scsi:2
          physical id: 7
          bus info: usb@1:1.5
          logical name: scsi2
          capabilities: emulated
        *-disk
             description: SCSI Disk
             physical id: 0.0.0
             bus info: scsi@2:0.0.0
             logical name: /dev/sdc
             size: 14GiB (15GB)
             capabilities: partitioned partitioned:dos
             configuration: sectorsize=512 signature=c3072e18
           *-volume
                description: EXT4 volume
                vendor: Linux
                physical id: 1
                bus info: scsi@2:0.0.0,1
                logical name: /dev/sdc1
                version: 1.0
                serial: 7e3913ee-a998-4db2-942f-8c1c4f11fea4
                size: 14GiB
                capacity: 14GiB
                capabilities: primary journaled extended_attributes large_files huge_files dir_nlink extents ext4 ext2 initialized
 
Oops. I've seem to have a FAT-partition on /dev/sda1 and a EXT4 partition on /dev/sdc1. Shame on me. I'll fix that later. First repair the hardware...

So I have /dev/sdb on usb@1:1.3. This should be the lower left USB-socket.
Let's pull that one and repair it with that nasty online Transcend windows-tool.

OK repair succeeded. Now just a few more actions to add the drive to the array and rebuild it. 

By using "sudo lshw" again you can find out what drive letter the new/repaired USB-drive got. In my case it's now /dev/sdd.
     *-scsi:1
          physical id: 6
          bus info: usb@1:1.3
          logical name: scsi3
          capabilities: emulated
        *-disk
             description: SCSI Disk
             physical id: 0.0.0
             bus info: scsi@3:0.0.0
             logical name: /dev/sdd

Just a few more commands to get back on track.

sudo fdisk -> option "t" -> change partition system id to linux (83) (I hate Windows)
sudo mkfs.ext4 /dev/sdd (make a linux ext filesystem /dev/sdd1)
sudo mdadm /dev/md0 -r detached (to remove the detached drive from the array)
sudo mdadm --add /dev/md0 /dev/sdd1 (add the new drive back to the array)
sudo mdadm -D /dev/md0 (check the array status).


/dev/md0:
        Version : 1.2
  Creation Time : Sun Dec 14 09:28:08 2014
     Raid Level : raid1
     Array Size : 15417216 (14.70 GiB 15.79 GB)
  Used Dev Size : 15417216 (14.70 GiB 15.79 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Wed May 20 10:18:41 2015
          State : active, degraded, recovering
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 0% complete

           Name : serverpi:0  (local to host serverpi)
           UUID : 52b533f2:c23cf930:d90ba49f:0998349d
         Events : 1729099

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       3       8       49        1      spare rebuilding   /dev/sdd1
       4       8       33        2      active sync   /dev/sdc1

Now we wait for the rebuild and we are back in business.

If you want to know more detail about the rebuild status, you could do "watch cat /proc/mdsat" and see your rebuild progress. I will not wait for this as it will take another 474.6 minutes.
Personalities : [raid1]
md0 : active raid1 sdd1[3] sdc1[4] sda1[0]
      15417216 blocks super 1.2 [3/2] [U_U]
      [>....................]  recovery =  2.9% (457856/15417216) finish=474.6min speed=524K/sec
     
unused devices: <none>
I hope this helped you. If not it will surely help me the next time one of my drives drive fails.




Geen opmerkingen:

Een reactie plaatsen