챕터 4. ZFS 데이터셋 (ZFS Datasets)

With ordinary filesystems you create partitions to separate different types of data, apply different optimizations to them, and limit how much of your space the partition can consume. Each partition receives a specific amount of space from the disk. We’ve all been there. We make our best guesses at how much disk space each partition on this system will need next month, next year, and five years from now. Fast forward to the future, and the amount of space you decided to give each partition is more than likely wrong. A partition without enough space for all its data sends you adding disks or moving data, complicating system management. When a partition has too much space, you kick yourself and use it as a dumping ground for stuff you’d rather have elsewhere. More than one of Lucas’ UFS2 systems has /usr/ports as a symlink to somewhere in /home. Jude usually ends up with some part of /var living in /usr/local/var.

ZFS solves this problem by pooling free space, giving your partitions flexibility impossible with more common filesystems. Each ZFS dataset you create consumes only the space required to store the files within it. Each dataset has access to all of the free space in the pool, eliminating your worries about the size of your partitions. You can limit the size of a dataset with a quota or guarantee it a minimum amount of space with a reservation, as discussed in Chapter 6.

Regular filesystems use the separate partitions to establish different policies and optimizations for the different types of data. /var contains often-changing files like logs and databases. The root filesystem needs consistency and safety over performance. Over in /home, anything goes. Once you establish a policy for a traditional filesystem, though, it’s really hard to change. The tunefs(8) utility for UFS requires the filesystem be unmounted to make changes. Some characteristics, such as the number of inodes, just cannot be changed after the filesystem has been created.

The core problem of traditional filesystems distills to inflexibility. ZFS datasets are almost infinitely flexible.

Datasets

A dataset is a named chunk of data. This data might resemble a traditional filesystem, with files, directories, and permissions and all that fun stuff. It could be a raw block device, or a copy of other data, or anything you can cram onto a disk.

ZFS uses datasets much like a traditional filesystem might use partitions. Need a policy for /usr and a separate policy for /home? Make each a dataset. Need a block device for an iSCSI target? That’s a dataset. Want a copy of a dataset? That’s another dataset.

Datasets have a hierarchical relationship. A single storage pool is the parent of each top-level dataset. Each dataset can have child datasets. Datasets inherit many characteristics from their parent, as we’ll see throughout this chapter.

You’ll perform all dataset operations with the zfs(8) command. This command has all sorts of sub-commands.

Dataset Types

ZFS currently has five types of datasets: filesystems, volumes, snapshots, clones, and bookmarks.

A filesystem dataset resembles a traditional filesystem. It stores files and directories. A ZFS filesystem has a mount point and supports traditional filesystem characteristics like read-only, restricting setuid binaries, and more. Filesystem datasets also hold other information, including permissions, timestamps for file creation and modification, NFSv4 Access Control Flags, chflags(2), and the like.

A ZFS volume, or zvol, is a block device. In an ordinary filesystem, you might create a file-backed filesystem for iSCSI or a special-purpose UFS partition. On ZFS, these block devices bypass all the overhead of files and directories and reside directly on the underlying pool. Zvols get a device node, skipping the FreeBSD memory devices used to mount disk images.

A snapshot is a read-only copy of a dataset from a specific point in time. Snapshots let you retain previous versions of your filesystem and the files therein for later use. Snapshots use an amount of space based on the difference between the current filesystem and what’s in the snapshot.

A clone is a new dataset based on a snapshot of an existing dataset, allowing you to fork a filesystem. You get an extra copy of everything in the dataset. You might clone the dataset containing your production web site, giving you a copy of the site that you can hack on without touching the production site. A clone only consumes space to store the differences from the original snapshot it was created from. Chapter 7 covers snapshots, clones, and bookmarks.

Why Do I Want Datasets?

You obviously need datasets. Putting files on the disk requires a filesystem dataset. And you probably want a dataset for each traditional Unix partition, like /usr and /var. But with ZFS, you want a lot of datasets. Lots and lots and lots of datasets. This would be cruel madness with a traditional filesystem, with its hard-coded limits on the number of partitions and the inflexibility of those partitions. But using many datasets increases the control you have over your data.

Each ZFS dataset has a series of properties that control its operation, allowing the administrator to control how the dataset performs and how carefully it protects its data. You can tune each dataset exactly as you can with a traditional filesystem. Dataset properties work much like pool properties.

The sysadmin can delegate control over individual datasets to another user, allow the user to manage it without root privileges. If your organization has a whole bunch of project teams, you can give each project manager their own chunk of space and say, “Here, arrange it however you want.” Anything that reduces our workload is a good thing.

Many ZFS features, such as replication and snapshots, operate on a per-dataset basis. Separating your data into logical groups makes it easier to use these ZFS features to support your organization.

Take the example of a web server with dozens of sites, each maintained by different teams. Some teams are responsible for multiple sites, while others have only one. Some people belong to multiple teams. If you follow the traditional filesystem model, you might create a /webserver dataset, put everything in it, and control access with group permissions and sudo(8). You’ve lived like this for decades, and it works, so why change?

But create a dataset for each team, and give each site its own dataset within that parent dataset, and possibilities multiply.

A team needs a copy of a web site for testing? Clone it. With traditional filesystems, you’d have to copy the whole site directory, doubling the amount of disk needed for the site and taking much, much longer. A clone uses only the amount of space for the differences between the sites and appears instantaneously.

The team is about to deploy a new version of a site, but wants a backup of the old site? Create a snapshot. This new site probably uses a whole bunch of the same files as the old one, so you’ll reduce disk space usage. Plus, when the deployment goes horribly wrong, you can restore the old version by rolling back to the snapshot.

A particular web site needs filesystem-level performance tweaks, or compression, or some locally created property? Set it for that site.

You might create a dataset for each team, and then let the teams create their own child datasets for their own sites. You can organize your datasets to fit your people, rather than organizing your people to fit your technology.

When you must change a filesystem setting (property) on all of the sites, make the change to the parent dataset and let the children inherit it.

The same benefits apply to user home directories.

You can also move datasets between machines. Your web sites overflow the web server? Send half the datasets, along with their custom settings and all their clones and snapshots, to the new server.

There is one disadvantage to using many filesystem datasets. When you move a file within a filesystem, the file is renamed. Moving files between separate filesystems requires copying the file to a new location and deleting it from the old, rather than just renaming it. Inter-dataset file copies take more time and require more free space. But that’s trivial against all the benefits ZFS gives you with multiple datasets. This problem exists on other filesystems as well, but hosts using most other filesystems have only a few partitions, making it less obvious.

Viewing Datasets

The zfs list command shows all of the datasets, and some basic information about them.

# zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 420M 17.9G 96K none mypool/ROOT 418M 17.9G 96K none mypool/ROOT/default 418M 17.9G 418M / ...

The first field shows the dataset’s name.

Under USED and REFER you find information about how much disk space the dataset uses. One downside to ZFS’ incredible flexibility and efficiency is that its interpretation of disk space usage seems somewhat surreal if you don’t understand it. Chapter 6 discusses disk space and strategies to use it.

The AVAIL column shows how much space remains free in the pool or dataset.

Finally MOUNTPOINT shows where the dataset should be mounted. That doesn’t mean that the dataset is mounted, merely that if it were to be mounted, this is where it would go. (Use zfs mount to see all mounted ZFS filesystems.)

If you give a dataset as an argument, zfs list shows only that specific dataset.

# zfs list mypool/lamb NAME USED AVAIL REFER MOUNTPOINT mypool/lamb 192K 17.9G 96K /lamb

Restrict the type of dataset shown with the -t flag and the type. You can show filesystems, volumes, or snapshots. Here we display snapshots, and only snapshots.

# zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT zroot/var/log/db@backup 0 - 10.0G -

Now that you can see filesystems, let’s make some.

Creating, Moving, and Destroying Datasets

Use the zfs create command to create any dataset. We’ll look at snapshots, clones, and bookmarks in Chapter 7, but let’s discuss filesystems and volumes now.

Creating Filesystems

Filesystems are the most common type of dataset on most systems. Everyone needs a place to store and organize files. Create a filesystem dataset by specifying the pool and the filesystem name.

# zfs create mypool/lamb

This creates a new dataset, lamb, on the ZFS pool called mypool. If the pool has a default mount point, the new dataset is mounted by default (see “Mounting ZFS Filesystems” later this chapter).

# mount | grep lamb mypool/lamb on /lamb (zfs, local, noatime, nfsv4acls)

The mount settings in parentheses are usually ZFS properties, inherited from the parent dataset. To create a child filesystem, give the full path to the parent filesystem.

# zfs create mypool/lamb/baby

The dataset inherits many of its characteristics, including its mount point, from the parent, as we’ll see in “Parent/Child Relationships” later in this chapter.

Creating Volumes

Use the -V flag and a volume size to tell zfs create that you want to create a volume. Give the full path to the volume dataset.

# zfs create -V 4G mypool/avolume

Zvols show up in a dataset list like any other dataset. You can tell zfs list to show only zvols by adding the -t volume option.

# zfs list mypool/avolume NAME USED AVAIL REFER MOUNTPOINT mypool/avolume 4.13G 17.9G 64K -

Zvols automatically reserve an amount of space equal to the size of the volume plus the ZFS metadata. This 4 GB zvol uses 4.13 GB of space.

As block devices, zvols do not have a mount point. They do get a device node under /dev/zvol, so you can access them as you would any other block device.

# ls -al /dev/zvol/mypool/avolume crw-r----- 1 root operator 0x4d Mar 27 20:22 /dev/zvol/mypool/avolume

You can run newfs(8) on this device node, copy a disk image to it, and generally use it like any other block device.

Renaming Datasets

You can rename a dataset with, oddly enough, the zfs rename command. Give the dataset’s current name as the first argument and the new location as the second.

# zfs rename db/production db/old # zfs rename db/testing db/production

Use the -f flag to forcibly rename the dataset. You cannot unmount a filesystem with processes running in it, but the -f flag gleefully forces the unmount. Any process using the dataset loses access to whatever it was using, and reacts however it will.1

Moving Datasets

You can move a dataset from part of the ZFS tree to another, making the dataset a child of its new parent. This may cause many of the dataset’s properties to change, since children inherit properties from their parent. Any properties set specifically on the dataset will not change.

Here we move a database out from under the zroot/var/db dataset, to a new parent where you have set some properties to improve fault tolerance.

# zfs rename zroot/var/db/mysql zroot/important/mysql

Note that since mount points are inherited, this will likely change the dataset’s mount point. Adding the -u flag to the rename command will cause ZFS not to immediately change the mount point, giving you time to reset the property to the intended value. Remember that if the machine is restarted, or the dataset is manually remounted, it will use its new mount point.

You can rename a snapshot, but you cannot move snapshots out of their parent dataset. Snapshots are covered in detail in Chapter 7.

Destroying Datasets

Sick of that dataset? Drag it out behind the barn and put it out of your misery with zfs destroy.

# zfs destroy db/old

If you add the -r flag, you recursively destroy all children (datasets, snapshots, etc.) of the dataset. To destroy any cloned datasets while you’re at it, use -R. Be very careful recursively destroying datasets, as you can frequently be surprised by what, exactly, is a child of a dataset.

You might use the -v and -n flags to see exactly what will happen when you destroy a dataset. The -v flag prints verbose information about what gets destroyed, while -n tells zfs(8) to perform a dry run. Between the two, they show what this command would actually destroy before you pull the trigger.

ZFS Properties

ZFS datasets have a number of settings, called properties, that control how the dataset works. While you can set a few of these only when you create the dataset, most of them are tunable while the dataset is live. ZFS also offers a number of read-only properties that provide information such as the amount of space consumed by the dataset, the compression or deduplication ratios, and the creation time of the dataset.

Each dataset inherits its properties from its parent, unless the property is specifically set on that dataset.

Viewing Properties

The zfs(8) tool can retrieve a specific property, or all properties for a dataset. Here we change the compression property to off.

# zfs set compression=off mypool zfs get compression mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb compression lz4 inherited from mypool

Under NAME we see the dataset you asked about, and PROPERTY shows the property you requested. The VALUE is what the property is set to.

The SOURCE is a little more complicated. A source of default means that this property is set to ZFS’ default. A local source means that someone deliberately set this property on this dataset. A temporary property was set when the dataset was mounted, and this property reverts to its usual value when the dataset is unmounted. An inherited property comes from a parent dataset, as discussed in “Parent/Child Relationships” later in this chapter.

Some properties have no source because the source is either irrelevant or inherently obvious. The creation property, which records the date and time the dataset was created, has no source. The value came from the system clock.

If you don’t specify a dataset name, zfs get shows the value of this property for all datasets. The special property keyword all retrieves all of a dataset’s properties.

# zfs get all mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb type filesystem - mypool/lamb creation Fri Mar 27 20:05 2015 - mypool/lamb used 192K - ...

If you use all and don’t give a dataset name, you get all the properties for all datasets. This is a lot of information. Show multiple properties by separating the property names with commas.

# zfs get quota,reservation zroot/home NAME PROPERTY VALUE SOURCE zroot/home quota none local zroot/home reservation none default

You can also view properties with zfs list and the -o modifier. This is most suited for when you want to view several properties from multiple datasets. Use the special property name to show the dataset’s name.

# zfs list -o name,quota,reservation NAME QUOTA RESERV db none none zroot none none zroot/ROOT none none zroot/ROOT/default none none ... zroot/var/log 100G 20G ...

You can also add a dataset name to see these properties in this format for that dataset.

Changing Properties

Change properties with the zfs set command. Give the property name, the new setting, and the dataset name. Here we change the compression property to off.

# zfs set compression=off mypool/lamb/baby

Confirm your change with zfs get.

# zfs get compression mypool/lamb/baby NAME PROPERTY VALUE SOURCE mypool/lamb/baby compression off local

Most properties apply only to data written after the property is changed. The compression property tells ZFS to compress data before writing it to disk. We talk about compression in Chapter 6. Disabling compression doesn’t uncompress any data written before the change was made. Similarly, enabling compression doesn’t magically compress data already on the disk. To get the full benefit of enabling compression, you must rewrite every file. You’re better off creating a new dataset, copying the data over with zfs send, and destroying the original dataset.

Read-Only Properties

ZFS uses read-only properties to offer basic information about the dataset. Disk space usage is expressed as properties. You can’t change how much data you’re using by changing the property that says “your disk is half-full.” (Chapter 6 covers ZFS disk space usage.) The creation property records when this dataset was created. You can change many read-only properties by adding or removing data to the disk, but you can’t write these properties directly.

Filesystem Properties

One key tool for managing the performance and behavior of traditional filesystems is mount options. You can mount traditional filesystems read-only, or use the noexec flag to disable running programs from them. ZFS uses properties to achieve the same effects. Here are the properties used to accomplish these familiar goals.

atime

A file’s atime indicates when the file was last accessed. ZFS’ atime property controls whether the dataset tracks access times. The default value, on, updates the file’s atime metadata every time the file is accessed. Using atime means writing to the disk every time it’s read.

Turning this property off avoids writing to the disk when you read a file, and can result in significant performance gains. It might confuse mailers and other similar utilities that depend on being able to determine when a file was last read.

Leaving atime on increases snapshot size. The first time a file is accessed, its atime is updated. The snapshot retains the original access time, while the live filesystem contains the newly updated accessed time. This is the default.

exec

The exec property determines if anyone can run binaries and commands on this filesystem. The default is on, which permits execution. Some environments don’t permit users to execute programs from their personal or temporary directories. Set the exec property to off to disable execution of programs on the filesystem.

The exec property doesn’t prohibit people from running interpreted scripts, however. If a user can run /bin/sh, they can run /bin/sh /home/mydir/script.sh. The shell is what’s actually executing—it only takes instructions from the script.

readonly

If you don’t want anything writing to this dataset, set the readonly property to on. The default, off, lets users modify the dataset within administrative permissions.

setuid

Many people consider setuid programs risky.2 While some setuid programs must be setuid, such as passwd(1) and login(1), there’s rarely a need to have setuid programs on filesystems like /home and /tmp. Many sysadmins disallow setuid programs except on specific filesystems.

ZFS’ setuid property toggles setuid support. If set to on, the filesystem supports setuid. If set to off, the setuid flag is ignored.

User-Defined Properties

ZFS properties are great, and you can’t get enough of them, right? Well, start adding your own. The ability to store your own metadata along with your datasets lets you develop whole new realms of automation. The fact that children automatically inherit these properties makes life even easier.

To make sure your custom properties remain yours, and don’t conflict with other people’s custom properties, create a namespace. Most people prefix their custom properties with an organizational identifier and a colon. For example, FreeBSD-specific properties have the format “org.freebsd:propertyname,” such as org.freebsd:swap. If the illumos project creates its own property named swap, they’d call it org.illumos:swap. The two values won’t collide.

For example, suppose Jude wants to control which datasets get backed up via a dataset property. He creates the namespace com.allanjude.3 Within that namespace, he creates the property backup_ignore.

# zfs set com.allanjude:backup_ignore=on mypool/lamb

Jude’s backup script checks the value of this property. If it’s set to true, the backup process skips this dataset.

Parent/Child Relationships

Datasets inherit properties from their parent datasets. When you set a property on a dataset, that property applies to that dataset and all of its children. For convenience, you can run zfs(8) commands on a dataset and all of its children by adding the -r flag. Here, we query the compression property on a dataset and all of its children.

# zfs get -r compression mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb compression lz4 inherited from mypool mypool/lamb/baby compression off local

Look at the source values. The first dataset, mypool/lamb, inherited this property from the parent pool. In the second dataset, this property has a different value. The source is local, meaning that the property was set specifically on this dataset.

We can restore the original setting with the zfs inherit command.

# zfs inherit compression mypool/lamb/baby # zfs get -r compression mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb compression lz4 inherited from mypool mypool/lamb/baby compression lz4 inherited from mypool

The child now inherits the compression properties from the parent, which inherits from the grandparent.

When you change a parent’s properties, the new properties automatically propagate down to the child.

# zfs set compression=gzip-9 mypool/lamb # zfs get -r compression mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb compression gzip-9 local mypool/lamb/baby compression gzip-9 inherited from mypool/lamb

I told the parent dataset to use gzip-9 compression. That percolated down to the child.

Inheritance and Renaming

When you move or rename a dataset so that it has a new parent, the parent’s properties automatically propagate down to the child. Locally set properties remain unchanged, but inherited ones switch to those from the new parent.

Here we create a new parent dataset and check its compression property.

# zfs create mypool/second # zfs get compress mypool/second NAME PROPERTY VALUE SOURCE mypool/second compression lz4 inherited from mypool

Our baby dataset uses gzip-9 compression. It’s inherited this property from mypool/lamb. Now let’s move baby to be a child of second, and see what happens to the compression property.

# zfs rename mypool/lamb/baby mypool/second/baby # zfs get -r compression mypool/second NAME PROPERTY VALUE SOURCE mypool/second compression lz4 inherited from mypool mypool/second/baby compression lz4 inherited from mypool

The child dataset now belongs to a different parent, and inherits its properties from the new parent. The child keeps any local properties.

Data on the baby dataset is a bit of a tangle, however. Data written before compression was turned on is uncompressed. Data written while the dataset used gzip-9 compression is compressed with gzip-9. Any data written now will be compressed with lz4. ZFS sorts all this out for you automatically, but thinking about it does make one's head hurt.

Removing Properties

While you can set a property back to its default value, it’s not obvious how to change the source back to inherit or default, or how to remove custom properties once they’re set.

To remove a custom property, inherit it.

# zfs inherit com.allanjude:backup_ignore mypool/lamb

This works even if you set the property on the root dataset.

To reset a property to its default value on a dataset and all its children, or totally remove custom properties, use the zfs inherit command on the pool’s root dataset.

# zfs inherit -r compression mypool

It’s counterintuitive, but it knocks the custom setting off of the root dataset.

Mounting ZFS Filesystems

With traditional filesystems you listed each partition, its type, and where it should be mounted in /etc/fstab. You even listed temporary mounts such as floppies and CD-ROM drives, just for convenience. ZFS allows you to create such a large number of filesystems that this quickly grows impractical.

Each ZFS filesystem has a mountpoint property that defines where it should be mounted. The default mountpoint is built from the pool’s mountpoint. If a pool doesn’t have a mount point, you must assign a mount point to any datasets you want to mount.

# zfs get mountpoint zroot/usr/home NAME PROPERTY VALUE SOURCE zroot/usr/home mountpoint /usr/home inherited from zroot/usr

The filesystem normally get mounted at /usr/home. You could override this when manually mounting the filesystem.

The zroot pool used for a default FreeBSD install doesn’t have a mount point set. If you create new datasets directly under zroot, they won’t have a mount point. Datasets created on zroot under, say, /usr, inherit a mount point from their parent dataset.

Any pool other than the pool with the root filesystem normally has a mount point named after the pool. If you create a pool named db, it gets mounted at /db. All children inherit their mount point from that pool unless you change them.

When you change the mountpoint property for a filesystem, the filesystem and any children that inherit the mount point are unmounted. If the new value is legacy, then they remain unmounted. Otherwise, they are automatically remounted in the new location if the property was previously legacy or none, or if they were mounted before the property was changed. In addition, any shared filesystems are unshared and shared in the new location.

Just like ordinary filesystems, ZFS filesystems aren’t necessarily mounted. The canmount property controls a filesystem’s mount behavior. If canmount is set to yes, running zfs mount -a mounts the filesystem, just like mount -a. When you enable ZFS in /etc/rc.conf, FreeBSD runs zfs mount -a at startup.

When the canmount property is set to noauto, a dataset can only be mounted and unmounted explicitly. The dataset is not mounted automatically when the dataset is created or imported, nor is it mounted by the zfs mount -a command or unmounted by zfs unmount -a.

Things can get interesting when you set canmount to off. You might have two non-mountable datasets with the same mount point. A dataset can exist solely for the purpose of being the parent to future datasets, but not actually store files, as we’ll see below. C

hild datasets do not inherit the canmount property.

Changing the canmount property does not automatically unmount or mount the filesystem. If you disable mounting on a mounted filesystem, you’ll need to manually unmount the filesystem or reboot.

Datasets without Mount Points

ZFS datasets are hierarchical. You might need to create a dataset that will never contain any files only so it can be the common parent of a number of other datasets. Consider a default install of FreeBSD 10.1 or newer.

# zfs mount zroot/ROOT/default / zroot/tmp /tmp zroot/usr/home /usr/home zroot/usr/ports /usr/ports zroot/usr/src /usr/src ...

We have all sorts of datasets under /usr, but there’s no /usr dataset mounted. What’s going on?

A zfs list shows that a dataset exists, and it has a mount point of /usr. But let’s check the mountpoint and canmount properties of zroot/usr and all its children.

# zfs list -o name,canmount,mountpoint -r zroot/usr NAME CANMOUNT MOUNTPOINT zroot/usr off /usr zroot/usr/home on /usr/home zroot/usr/ports on /usr/ports zroot/usr/src on /usr/src

With canmount set to off, the zroot/usr dataset is never mounted. Any files written in /usr, such as the commands in /usr/bin and the packages in /usr/local, go into the root filesystem. Lower-level mount points such as /usr/src have their own datasets, which are mounted.

The dataset exists only to be a parent to the child datasets. You’ll see something similar with the /var partitions.

Multiple Datasets with the Same Mount Point

Setting canmount to off allows datasets to be used solely as a mechanism to inherit properties. One reason to set canmount to off is to have two datasets with the same mount point, so that the children of both datasets appear in the same directory, but might have different inherited characteristics.

FreeBSD’s installer does not have a mountpoint on the default pool, zroot. When you create a new dataset, you must assign a mount point to it.

If you don’t want to assign a mount point to every dataset you create right under the pool, you might assign a mountpoint of / to the zroot pool and leave canmount set to off. This way, when you create a new dataset, it has a mountpoint to inherit. This is a very simple example of using multiple datasets with the same mount point.

Imagine you want an /opt directory with two sets of subdirectories. Some of these directories contain programs, and should never be written to after installation. Other directories contain data. You must lock down the ability to run programs at the filesystem level.

# zfs create db/programs # zfs create db/data

Now give both of these datasets the mountpoint of /opt and tell them that they cannot be mounted.

# zfs set canmount=off db/programs # zfs set mountpoint=/opt db/programs

Install your programs to the dataset, and then make it read-only.

# zfs set readonly=on db/programs

You can’t run programs from the db/data dataset, so turn off exec and setuid. We need to write data to these directories, however.

# zfs set canmount=off db/data # zfs set mountpoint=/opt db/data # zfs set setuid=off db/data # zfs set exec=off db/data

Now create some child datasets. The children of the db/programs dataset inherit that dataset’s properties, while the children of the db/data dataset inherit the other set of properties.

# zfs create db/programs/bin # zfs create db/programs/sbin # zfs create db/data/test # zfs create db/data/production

We now have four datasets mounted inside /opt, two for binaries and two for data. As far as users know, these are normal directories. No matter what the file permissions say, though, nobody can write to two of these directories. Regardless of what trickery people pull, the system won’t recognize executables and setuid files in the other two. When you need another dataset for data or programs, create it as a child of the dataset with the desired settings. Changes to the parent datasets propagate immediately to all the children.

Pools without Mount Points

While a pool is normally mounted at a directory named after the pool, that isn’t necessarily so.

# zfs set mountpoint=none mypool

This pool no longer gets mounted. Neither does any dataset on the pool unless you specify a mount point. This is how the FreeBSD installer creates the pool for the OS.

# zfs set mountpoint=/someplace mypool/lamb

The directory will be created if necessary and the filesystem mounted.

Manually Mounting and Unmounting Filesystems

To manually mount a filesystem, use zfs mount and the dataset name. This is most commonly used for filesystems with canmount set to noauto.

# zfs mount mypool/usr/src

To unmount a filesystem and all of its children, use zfs unmount.

# zfs unmount mypool/second

If you want to temporarily mount a dataset at a different location, use the -o flag to specify a new mount point. This mount point only lasts until you unmount the dataset.

# zfs mount -o mountpoint=/mnt mypool/lamb

You can only mount a dataset if it has a mountpoint defined. Defining a temporary mount point when the dataset has no mount point gives you an error.

ZFS and /etc/fstab

You can choose to manage some or all of your ZFS filesystem mount points with /etc/fstab if you prefer. You can recreate the zvol device by renaming the volume with zfshe filesystem.

# zfs set mountpoint=legacy mypool/second

Now you can mount this dataset with the mount(8) command:

# mount -t zfs mypool/second /tmp/second

You can also add ZFS datasets to the system’s /etc/fstab. Use the full dataset name as the device node. Set the type to zfs. You can use the standard filesystem options of noatime, noexec, readonly or ro, and nosuid. (You could also explicitly give the default behaviors of atime, exec, rw, and suid, but these are ZFS’ defaults.) The mount order is normal, but the fsck field is ignored. Here’s an /etc/fstab entry that mounts the dataset scratch/junk nosuid at /tmp.

scratch/junk /tmp nosuid 2 0

Tweaking ZFS Volumes

Zvols are pretty straightforward—here’s a chunk of space as a block device; use it. You can adjust how a volume uses space and what kind of device node it offers.

Space Reservations

The volsize property of a zvol specifies the volume’s logical size. By default, creating a volume reserves an amount of space for the dataset equal to the volume size. (If you look ahead to Chapter 6, it establishes a refreservation of equal size.) Changing volsize changes the reservation. The volsize can only be set to a multiple of the volblocksize property, and cannot be zero.

Without the reservation, the volume could run out of space, resulting in undefined behavior or data corruption, depending on how the volume is used. These effects can also occur when the volume size is changed while it is in use, particularly when shrinking the size. Adjusting the volume size can confuse applications using the block device.

Zvols also support sparse volumes, also known as thin provisioning. A sparse volume is a volume where the reservation is less than the volume size. Essentially, using a sparse volume permits allocating more space than the dataset has available. With sparse provisioning you could, say, create ten 1 TB sparse volumes on your 5 TB dataset. So long as your volumes are never heavily used, nobody will notice that you’re overcommitted.

Sparse volumes are not recommended. Writes to a sparse volume can fail with an “out of space” error even if the volume itself looks only partially full.

Specify a sparse volume at creation time by specifying the -s option to the zfs create -V command. Changes to volsize are not reflected in the reservation. You can also reduce the reservation after the volume has been created.

Zvol Mode

FreeBSD normally exposes zvols to the operating system as geom(4) providers, giving them maximum flexibility. You can change this with the volmode property.

Setting a volume’s volmode to dev exposes volumes only as a character device in /dev. Such volumes can be accessed only as raw disk device files. They cannot be partitioned or mounted, and they cannot participate in RAIDs or other GEOM features. They are faster. In some cases where you don’t trust the device using the volume, dev mode can be safer.

Setting volmode to none means that the volume is not exposed outside ZFS. These volumes can be snapshotted, cloned, and replicated, however. These volumes can be suitable for backup purposes.

Setting volmode to default means that volume exposure is controlled by the sysctl vfs.zfs.vol.mode. You can set the default zvol mode system-wide. A value of 1 means the default is geom, 2 means dev, and 3 means none.

While you can change the property on a live volume, it has no effect. This property is processed only during volume creation and pool import. You can recreate the zvol device by renaming the volume with zfs rename.

Dataset Integrity

Most of ZFS’ protections work at the VDEV layer. That’s where blocks and disks go bad, after all. Some hardware limits pool redundancy, however. Very few laptops have enough hard drives to use mirroring, let alone RAID-Z. You can do some things at the dataset layer to offer some redundancy, however, by using checksums, metadata redundancy, and copies. Most users should never touch the first two, and users with redundant virtual devices probably want to leave all three alone.

Checksums

ZFS computes and stores checksums for every block that it writes. This ensures that when a block is read back, ZFS can verify that it is the same as when it was written, and has not been silently corrupted in one way or another. The checksum property controls which checksum algorithm the dataset uses. Valid settings are on, fletcher2, fletcher4, sha256, off, and noparity.

The default value, on, uses the algorithm selected by the OpenZFS developers. In 2015 that algorithm is fletcher4, but it might change in future releases.

The standard algorithm, fletcher4, is the default checksum algorithm. It’s good enough for most use and is very fast. If you want to use fletcher4 forever and ever, you could set this property to fletcher4. We recommend keeping the default of on, however, and letting ZFS upgrade your pool’s checksum algorithm when it’s time.

The value off disables integrity checking on user data.

The value noparity not only disables integrity but also disables maintaining parity for user data. This setting is used internally by a dump device residing on a RAID-Z pool and should not be used by any other dataset. Disabling checksums is not recommended.

Older versions of ZFS used the fletcher2 algorithm. While it’s supported for older pools, it’s certainly not encouraged. The sha256 algorithm is slower than fletcher4, but less likely to result in a collision. In most cases, a collision is not harmful.

The sha256 algorithm is frequently recommended when doing deduplication.

Copies

ZFS stores two or three copies of important metadata, and can give the same treatment to your important user data. The copies property tells ZFS how many copies of user data to keep. ZFS attempts to put those copies on different disks, or failing that, as far apart on the physical disk as possible, to help guard against hardware failure. When you increase the copies property, ZFS also increases the number of copies of the metadata for that dataset, to a maximum of three.

If your pool runs on two mirrored disks, and you set copies to 3, you’ll have six copies of your data. One of them should survive your ill-advised use of dd(1) on the raw provider device or that plunge off the roof.

Increasing or decreasing copies only affects data written after the setting change. Changing copies from 1 to 2 doesn’t suddenly create duplicate copies of all your data, as we see here. Create a 10 MB file of random data:

# dd if=/dev/random of=/lamb/random1 bs=1m count=10 10+0 records in 10+0 records out 10485760 bytes transferred in 0.144787 secs (72421935 bytes/sec) # zfs set copies=2 mypool/lamb

Now every block is stored twice. If one of the copies becomes corrupt, ZFS can still read your file. It knows which of the blocks is corrupt because its checksums won’t match. But look at the space use on the pool (the REFER space in the pool listing).

# zfs list mypool/lamb NAME USED AVAIL REFER MOUNTPOINT mypool/lamb 10.2M 13.7G 10.1M /lamb

Only the 10 MB we wrote were used. No extra copy was made of this file, as you wrote it before changing the copies property. With copies set to 2, however, if we either write another file or overwrite the original file, we’ll see different disk usage.

# dd if=/dev/random of=/lamb/random2 bs=1m count=10 10+0 records in 10+0 records out 10485760 bytes transferred in 0.141795 secs (73950181 bytes/sec)

Look at disk usage now.

# zfs list mypool/lamb NAME USED AVAIL REFER MOUNTPOINT mypool/lamb 30.2M 13.7G 30.1M /lamb

The total space usage is 30 MB, 10 for the first file of random data, and 20 for 2 copies of the second 10 MB file. When we look at the files with ls(1), they only show the actual size:

# ls -l /lamb/random* -rw-r--r-- 1 root wheel 10485760 Apr 6 15:27 /lamb/random1 -rw-r--r-- 1 root wheel 10485760 Apr 6 15:29 /lamb/random2

If you really want to muck with your dataset’s resilience, look at metadata redundancy.

Metadata Redundancy

Each dataset stores an extra copy of its internal metadata, so that if a single block is corrupted, the amount of user data lost is limited. This extra copy is in addition to any redundancy provided at the VDEV level (e.g., by mirroring or RAID-Z). It’s also in addition to any extra copies specified by the copies property (below), up to a total of three copies.

The redundant_metadata property lets you decide how redundant you want your dataset metadata to be. Most users should never change this property.

When redundant_metadata is set to all (the default), ZFS stores an extra copy of all metadata. If a single on-disk block is corrupt, at worst a single block of user data can be lost.

When you set redundant_metadata to most, ZFS stores an extra copy of only most types of metadata. This can improve performance of random writes, because less metadata must be written. When only most metadata is redundant, at worst about 100 blocks of user data can be lost if a single on-disk block is corrupt. The exact behavior of which metadata blocks are stored redundantly may change in future releases.

If you set redundant_metadata to most and copies to 3, and the dataset lives on a mirrored pool, then ZFS stores six copies of most metadata, and four copies of data and some metadata.

This property was designed for specific use cases that frequently update metadata, such as databases. If the data is already protected by sufficiently strong fault tolerance, reducing the number of copies of the metadata that must be written each time the database changes can improve performance. Change this value only if you know what you are doing.

Now that you have a grip on datasets, let’s talk about pool maintenance.

1 Probably badly.

2 Properly written setuid programs are not risky. That’s why real setuid programs are risky.

3 When you name ZFS properties after yourself, you are immortalized by your work. Whether this is good or bad depends on your work.