OK – we’re good to go for our final ZFS configuration. Recall from earlier that I will be configuring a two-disk RAID1 set, with an extra disk for hot-spare use, and the final disk to play with for backups, encryption, dedup, etc.
|
1 |
# zpool create datapool mirror c8t1d0 c8t2d0 spare c8t3d0 |
It was that easy:
|
1 2 3 4 |
# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT datapool 19.9G 88K 19.9G 0% 1.00x ONLINE - rpool 29.5G 4.10G 25.4G 13% 1.00x ONLINE - |
If we query the pool, all required elements (mirroring, hot-spare), will be in place:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# zpool status datapool pool: datapool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 spares c8t3d0 AVAIL errors: No known data errors |
The default ZFS dataset will also have been created:
|
1 2 3 |
# zfs list datapool NAME USED AVAIL REFER MOUNTPOINT datapool 88K 19.6G 31K /datapool |
For the root dataset mount, that mountpoint will be fine. Each new ZFS dataset created will have a more appropriate mountpoint set.
So – let’s assume we need to install some software, and require a /u01 filesystem on which to install (a common sight if you’ve worked around Oracle software long enough).
|
1 |
# zfs create datapool/u01 |
That simple command has created the new dataset for us:
|
1 2 3 4 5 |
# zfs list | grep u01 datapool/u01 31K 19.6G 31K /datapool/u01 # zfs list | grep datapool datapool 128K 19.6G 32K /datapool datapool/u01 31K 19.6G 31K /datapool/u01 |
Next, let’s move the mountpoint to somewhere more sensible, namely /u01:
|
1 2 3 4 |
# zfs set mountpoint=/u01 datapool/u01 # zfs list datapool/u01 NAME USED AVAIL REFER MOUNTPOINT datapool/u01 31K 19.6G 31K /u01 |
Again – another simiple task, it even created the non-existent mountpoint for us:
|
1 2 3 |
# df -h /u01 Filesystem Size Used Available Capacity Mounted on datapool/u01 20G 31K 20G 1% /u01 |
This dataset will stay the same size as its parent, datapool, until we start actually using it (or use and create other datasets).
Let’s turn on deduplication for the parent, so that it will be inherited from by any other datasets configured within datapool. I wont turn on encryption and compression and will control those at a finer-grained per-dataset level.
|
1 |
# zfs set dedup=on datapool |
Then we verify:
|
1 2 3 |
# zfs get dedup datapool NAME PROPERTY VALUE SOURCE datapool dedup on local |
As we created datapool/u01 prior to setting dedup=on, we need to head over there and set it on for that dataset too:
|
1 2 3 4 5 6 7 |
# zfs get dedup datapool/u01 NAME PROPERTY VALUE SOURCE datapool/u01 dedup off local # zfs set dedup=on datapool/u01 # zfs get dedup datapool/u01 NAME PROPERTY VALUE SOURCE datapool/u01 dedup on local |
Let’s start creating some more interesting child datasets.
ZFS Features
The first feature I wanted to try out was deduplication. Essentially – if a block is duplicated numerous times across a pool, it will be deduplicated (i.e. its duplicates removed) thus improving storage utilisation.
I set dedup=on on datapool so any new dataset created will inherit that property from its parent. Therefore:
|
1 |
# zfs create -o mountpoint=/dedupfs datapool/dedupfs |
Verify as always:
|
1 2 3 |
# zfs get dedup datapool/dedupfs NAME PROPERTY VALUE SOURCE datapool/dedupfs dedup on inherited from datapool |
You even get a nice little note that the value for this property has been inherited from the parent ZFS dataset – datapool.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# mkdir /dedupfs/{a,b,c} # for i in a b c; do time cp -Rp /usr/sbin/* /dedupfs/${i}; done real 0m0.892s user 0m0.030s sys 0m0.567s real 0m0.608s user 0m0.032s sys 0m0.570s real 0m0.863s user 0m0.033s sys 0m0.788s |
If we check via du the “real” values are reported:
|
1 2 3 4 |
# du -hs /dedupfs/* 88M /dedupfs/a 88M /dedupfs/b 88M /dedupfs/c |
A zpool list however has else to say:
|
1 2 3 |
# zpool list datapool NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT datapool 19.9G 37.1M 19.8G 0% 7.37x ONLINE - |
You can see that only 37.1M is actually allocated due to a deduplication factor of 7.37x – between our three copies of /usr/sbin the system was able to deduplicate by quite a large factor saving us a couple of hundred meg. Pretty cool. You can also get a simulated deduplication histogram on a datapool (with dedup=on or dedup=off) using zdb:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# zdb -S datapool Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 2 463 25.9M 25.9M 25.9M 1.36K 77.7M 77.7M 77.7M 4 50 3.71M 3.71M 3.71M 300 22.3M 22.3M 22.3M 8 3 51.5K 51.5K 51.5K 33 512K 512K 512K 16 49 5.91M 5.91M 5.91M 1.26K 158M 158M 158M 32 1 128K 128K 128K 42 5.25M 5.25M 5.25M Total 566 35.7M 35.7M 35.7M 2.99K 263M 263M 263M dedup = 7.38, compress = 1.00, copies = 1.00, dedup * compress / copies = 7.38 |
For all but the most demanding scenarios where calculating deduplication would be stressful on system resources, setting dedup=on on a dataset is normally a good idea.