OK – we’re good to go for our final ZFS configuration. Recall from earlier that I will be configuring a two-disk RAID1 set, with an extra disk for hot-spare use, and the final disk to play with for backups, encryption, dedup, etc.

# zpool create datapool mirror c8t1d0 c8t2d0 spare c8t3d0

1	# zpool create datapool mirror c8t1d0 c8t2d0 spare c8t3d0

It was that easy:

# zpool list
NAME      SIZE   ALLOC FREE   CAP  DEDUP  HEALTH  ALTROOT
datapool  19.9G  88K   19.9G  0%   1.00x  ONLINE  -
rpool     29.5G  4.10G 25.4G  13%  1.00x  ONLINE  -

# zpool list

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT

datapool 19.9G 88K 19.9G 0% 1.00x ONLINE -

rpool 29.5G 4.10G 25.4G 13% 1.00x ONLINE -

If we query the pool, all required elements (mirroring, hot-spare), will be in place:

# zpool status datapool
pool: datapool
state: ONLINE
scan: none requested
config:
NAME                STATE   READ WRITE CKSUM
datapool            ONLINE  0    0     0
   mirror-0         ONLINE  0    0     0
       c8t1d0       ONLINE  0    0     0
       c8t2d0       ONLINE  0    0     0
    spares
       c8t3d0       AVAIL
errors: No known data errors

# zpool status datapool

pool: datapool

state: ONLINE

scan: none requested

config:

NAME STATE READ WRITE CKSUM

datapool ONLINE 0 0 0

mirror-0 ONLINE 0 0 0

c8t1d0 ONLINE 0 0 0

c8t2d0 ONLINE 0 0 0

spares

c8t3d0 AVAIL

errors: No known data errors

The default ZFS dataset will also have been created:

# zfs list datapool
NAME      USED  AVAIL  REFER  MOUNTPOINT
datapool  88K   19.6G  31K    /datapool

# zfs list datapool

NAME USED AVAIL REFER MOUNTPOINT

datapool 88K 19.6G 31K /datapool

For the root dataset mount, that mountpoint will be fine. Each new ZFS dataset created will have a more appropriate mountpoint set.

So – let’s assume we need to install some software, and require a /u01 filesystem on which to install (a common sight if you’ve worked around Oracle software long enough).

# zfs create datapool/u01

1	# zfs create datapool/u01

That simple command has created the new dataset for us:

# zfs list | grep u01
datapool/u01 31K 19.6G 31K /datapool/u01
# zfs list | grep datapool
datapool 128K 19.6G 32K /datapool
datapool/u01 31K 19.6G 31K /datapool/u01

# zfs list | grep u01

datapool/u01 31K 19.6G 31K /datapool/u01

# zfs list | grep datapool

datapool 128K 19.6G 32K /datapool

datapool/u01 31K 19.6G 31K /datapool/u01

Next, let’s move the mountpoint to somewhere more sensible, namely /u01:

# zfs set mountpoint=/u01 datapool/u01
# zfs list datapool/u01
NAME         USED  AVAIL  REFER MOUNTPOINT
datapool/u01 31K   19.6G  31K   /u01

# zfs set mountpoint=/u01 datapool/u01

# zfs list datapool/u01

NAME USED AVAIL REFER MOUNTPOINT

datapool/u01 31K 19.6G 31K /u01

Again – another simiple task, it even created the non-existent mountpoint for us:

# df -h /u01
Filesystem     Size  Used  Available Capacity Mounted on
datapool/u01   20G   31K   20G       1%       /u01

# df -h /u01

Filesystem Size Used Available Capacity Mounted on

datapool/u01 20G 31K 20G 1% /u01

This dataset will stay the same size as its parent, datapool, until we start actually using it (or use and create other datasets).

Let’s turn on deduplication for the parent, so that it will be inherited from by any other datasets configured within datapool. I wont turn on encryption and compression and will control those at a finer-grained per-dataset level.

# zfs set dedup=on datapool

1	# zfs set dedup=on datapool

Then we verify:

# zfs get dedup datapool
NAME      PROPERTY  VALUE  SOURCE
datapool  dedup     on     local

# zfs get dedup datapool

NAME PROPERTY VALUE SOURCE

datapool dedup on local

As we created datapool/u01 prior to setting dedup=on, we need to head over there and set it on for that dataset too:

# zfs get dedup datapool/u01
NAME          PROPERTY VALUE SOURCE
datapool/u01  dedup    off   local
# zfs set dedup=on datapool/u01
# zfs get dedup datapool/u01
NAME          PROPERTY VALUE SOURCE
datapool/u01  dedup    on    local

# zfs get dedup datapool/u01

NAME PROPERTY VALUE SOURCE

datapool/u01 dedup off local

# zfs set dedup=on datapool/u01

# zfs get dedup datapool/u01

NAME PROPERTY VALUE SOURCE

datapool/u01 dedup on local

Let’s start creating some more interesting child datasets.

ZFS Features

The first feature I wanted to try out was deduplication. Essentially – if a block is duplicated numerous times across a pool, it will be deduplicated (i.e. its duplicates removed) thus improving storage utilisation.

I set dedup=on on datapool so any new dataset created will inherit that property from its parent. Therefore:

# zfs create -o mountpoint=/dedupfs datapool/dedupfs

1	# zfs create -o mountpoint=/dedupfs datapool/dedupfs

Verify as always:

# zfs get dedup datapool/dedupfs
NAME              PROPERTY VALUE SOURCE
datapool/dedupfs  dedup    on    inherited from datapool

# zfs get dedup datapool/dedupfs

NAME PROPERTY VALUE SOURCE

datapool/dedupfs dedup on inherited from datapool

You even get a nice little note that the value for this property has been inherited from the parent ZFS dataset – datapool.

# mkdir /dedupfs/{a,b,c}
# for i in a b c; do time cp -Rp /usr/sbin/* /dedupfs/${i}; done
real 0m0.892s
user 0m0.030s
sys 0m0.567s
real 0m0.608s
user 0m0.032s
sys 0m0.570s
real 0m0.863s
user 0m0.033s
sys 0m0.788s

# mkdir /dedupfs/{a,b,c}

# for i in a b c; do time cp -Rp /usr/sbin/* /dedupfs/${i}; done

real 0m0.892s

user 0m0.030s

sys 0m0.567s

real 0m0.608s

user 0m0.032s

sys 0m0.570s

real 0m0.863s

user 0m0.033s

sys 0m0.788s

If we check via du the “real” values are reported:

# du -hs /dedupfs/*
88M /dedupfs/a
88M /dedupfs/b
88M /dedupfs/c

# du -hs /dedupfs/*

88M /dedupfs/a

88M /dedupfs/b

88M /dedupfs/c

A zpool list however has else to say:

# zpool list datapool
NAME     SIZE   ALLOC  FREE   CAP DEDUP HEALTH  ALTROOT
datapool 19.9G  37.1M  19.8G  0%  7.37x ONLINE  -

# zpool list datapool

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT

datapool 19.9G 37.1M 19.8G 0% 7.37x ONLINE -

You can see that only 37.1M is actually allocated due to a deduplication factor of 7.37x – between our three copies of /usr/sbin the system was able to deduplicate by quite a large factor saving us a couple of hundred meg. Pretty cool. You can also get a simulated deduplication histogram on a datapool (with dedup=on or dedup=off) using zdb:

# zdb -S datapool
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
2      463    25.9M 25.9M 25.9M 1.36K  77.7M 77.7M 77.7M
4      50     3.71M 3.71M 3.71M 300    22.3M 22.3M 22.3M
8      3      51.5K 51.5K 51.5K 33     512K  512K  512K
16     49     5.91M 5.91M 5.91M 1.26K  158M  158M  158M
32     1      128K  128K  128K  42     5.25M 5.25M 5.25M
Total  566    35.7M 35.7M 35.7M 2.99K  263M  263M  263M
dedup = 7.38, compress = 1.00, copies = 1.00, dedup * compress / copies = 7.38

# zdb -S datapool

Simulated DDT histogram:

bucket allocated referenced

______ ______________________________ ______________________________

refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE

------ ------ ----- ----- ----- ------ ----- ----- -----

2 463 25.9M 25.9M 25.9M 1.36K 77.7M 77.7M 77.7M

4 50 3.71M 3.71M 3.71M 300 22.3M 22.3M 22.3M

8 3 51.5K 51.5K 51.5K 33 512K 512K 512K

16 49 5.91M 5.91M 5.91M 1.26K 158M 158M 158M

32 1 128K 128K 128K 42 5.25M 5.25M 5.25M

Total 566 35.7M 35.7M 35.7M 2.99K 263M 263M 263M

dedup = 7.38, compress = 1.00, copies = 1.00, dedup * compress / copies = 7.38

For all but the most demanding scenarios where calculating deduplication would be stressful on system resources, setting dedup=on on a dataset is normally a good idea.

Toki Winter

Advanced UNIX for the experienced system administrator

ZFS Part 2: Implementing Zpool/ZFS Configuration and ZFS Features

ZFS Features

ZFS Features

Related posts: