The FreeBSD Diary

The FreeBSD Diary (TM)

Providing practical examples since 1998

If you buy from Amazon USA, please support us by using this link.

Things look quiet here. But I've been doing a lot of blogging at because I prefer WordPress now. Not all my posts there are FreeBSD related. I am in the midst of migrating The FreeBSD Diary over to WordPress (and you can read about that here). Once the migration is completed, I'll move the FreeBSD posts into the new FreeBSD Diary website.

ZFS: do not give it all your HDD 1 August 2010
Need more help on this topic? Click here
This article has 6 comments
Show me similar articles

I'm about to rebuild my ZFS array (which I documented in my other diary). The array has been running for a while, but I recently learned some new facts about ZFS which spurred me on to rebuilding my array with future-proofing in mind.

This is my plan for tonight. As I type this, Jerry is over tonight, doing the heavy lifting for me. I am nursing a broken left elbow. The two new HDD have been installed and the system has been powered back up.

Tonight we will do the following:

  • identify the newly installed HDD
  • put a file system on those HDD
  • copy the existing ZFS array over to that new FS
  • destroy the existing ZFS array
  • partition each individual drive using gpart
  • add the drives back into the array
  • copy the data back
  • partition the two new FS and put them into the new array

This article originally covered all of the above steps. That soon led to a multi-day 3000 line document. I thought it best to break it it up into a few smaller articles. To this end, this article will cover only the part about partitioning HDD so as to avoid future problems at preplacement time.

Don't use all your HDD

In this section, let's assume you are building a new ZFS array. I will talk about how I like to partition my HDD with a little buffer zone, and why.

Let's assumed ada0 and ada6 are the drives you want to use. This is the list of ada devices from dmesg:

ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada7: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C)
ada8: 152587MB (312500000 512 byte sectors: 16H 63S/T 16383C)

As you can see, each of these devices is contains 3907029168 sectors, each containing 512 bytes. For a total of 1863 GB, on a 2TB HDD. However, not all 2TB HDD contain this same space. Even the same model of HDD can vary. If you are using ZFS, you should be aware of the following from man zpool:

zpool replace [-f] pool old_device [new_device]
    Replaces old_device with new_device. This is equivalent to  attach-
    ing  new_device,  waiting  for  it  to resilver, and then detaching
    The size of new_device must be greater than or equal to the minimum
    size of all the devices in a mirror or raidz configuration.

Thus, if your replacement HDD is just 1 sector smaller than the original, you cannot use it.

But there is a cunning plan. Partition the HDD and give only the partition to ZFS. Now, this isn't useful to you in hindsight if your array is broken now. This strategy is only useful when setting up a new array. The idea is to use slightly less than your entire HDD. Thus, if a replacement HDD happens to be smaller, you're covered.

Using gpart

There are another approaches to this, but I'm using gpart.

# gpart create -s GPT ad1
gpart: provider 'ad1': Invalid argument

Oh. Yes, wrong name. Let's try this:

# gpart create -s GPT ada0

Now let's see what we have:

# gpart show ada0
=>        34  3907029101  ada1  GPT  (1.8T)
          34  3907029101        - free -  (1.8T)

From the above, we can see one partition of 3907029101 sectors, starting at sector 34. Each sector is 512 bytes as can be seen here (in bold):

# camcontrol identify ada0
pass0: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device
pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)

protocol              ATA/ATAPI-8 SATA 2.x
device model          Hitachi HDS722020ALA330
firmware revision     JKAOA28A
serial number         JK1131YAHLJWLV
WWN                   5000cca221d68596
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         268435455 sectors
LBA48 supported       3907029168 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6
media RPM             7200

Feature                      Support  Enable    Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
overlap                        no
Tagged Command Queuing (TCQ)   no       no
Native Command Queuing (NCQ)   yes              32 tags
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      yes      no      0/0x00
automatic acoustic management  yes      no      254/0xFE        128/0x80
media status notification      no       no
power-up in Standby            yes      no
write-read-verify              no       no      0/0x0
unload                         no       no
free-fall                      no       no
data set management (TRIM)     no

I plan to leave 200MB free at the end of each HDD. Thus, the gpart commend to add a new partition is:

gpart add -b 2048 -s 3906824301 -t freebsd-zfs -l disk00 ada0

Please note that the above math is incorrect, but only slightly. It leaves some 99MB free, which is completely acceptable for this effort. The correct math is:

gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk00 ada0
  • -b 2048 - starts the partition 2048 sectors in from the start of the disk (leaving 1MB free)
  • the start is also on a 4KB boundary, which will give better performance on some HDD
  • -s 3906824301 leaves us 200MB free at the end of the HDD (note incorrect math).
  • -l disk00 creates a label which you can use when adding this device to the pool
Creating the pool

Let's assume we did the above with 5HDD. the command to create the new pool is:

# zpool create -f storage raidz2 gpt/disk00 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04

# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested

        NAME                      STATE     READ WRITE CKSUM
        storage                   ONLINE       0     0     0
          raidz2                  ONLINE       0     0     0
            gpt/disk00            ONLINE       0     0     0
            gpt/disk01            ONLINE       0     0     0
            gpt/disk02            ONLINE       0     0     0
            gpt/disk03            ONLINE       0     0     0
            gpt/disk04            ONLINE       0     0     0

errors: No known data errors
There, done.

When it comes time to replace one of the above devices, let's say gpt/disk02, you do this:

# zpool offline storage gpt/disk02

Then you remove that HDD from the system, and insert the new HDD. You partition the new HDD just like you did above, adjusting the math, and you instantly have a new partition exactly the same size as all the others.

Now add that disk back in:

# zpool replace storage gpt/disk02

Done. Let the array resilver, and you're good to go. Hopefully, this approach will save us both from headaches in the future.

Need more help on this topic? Click here
This article has 6 comments
Show me similar articles