챕터 4. ZFS 데이터셋 (ZFS Datasets)

With ordinary filesystems you create partitions to separate different types of data, apply different optimizations to them, and limit how much of your space the partition can consume. Each partition receives a specific amount of space from the disk. We’ve all been there. We make our best guesses at how much disk space each partition on this system will need next month, next year, and five years from now. Fast forward to the future, and the amount of space you decided to give each partition is more than likely wrong. A partition without enough space for all its data sends you adding disks or moving data, complicating system management. When a partition has too much space, you kick yourself and use it as a dumping ground for stuff you’d rather have elsewhere. More than one of Lucas’ UFS2 systems has /usr/ports as a symlink to somewhere in /home. Jude usually ends up with some part of /var living in /usr/local/var.

ZFS solves this problem by pooling free space, giving your partitions flexibility impossible with more common filesystems. Each ZFS dataset you create consumes only the space required to store the files within it. Each dataset has access to all of the free space in the pool, eliminating your worries about the size of your partitions. You can limit the size of a dataset with a quota or guarantee it a minimum amount of space with a reservation, as discussed in Chapter 6.

Regular filesystems use the separate partitions to establish different policies and optimizations for the different types of data. /var contains often-changing files like logs and databases. The root filesystem needs consistency and safety over performance. Over in /home, anything goes. Once you establish a policy for a traditional filesystem, though, it’s really hard to change. The tunefs(8) utility for UFS requires the filesystem be unmounted to make changes. Some characteristics, such as the number of inodes, just cannot be changed after the filesystem has been created.

The core problem of traditional filesystems distills to inflexibility. ZFS datasets are almost infinitely flexible.

Datasets

A dataset is a named chunk of data. This data might resemble a traditional filesystem, with files, directories, and permissions and all that fun stuff. It could be a raw block device, or a copy of other data, or anything you can cram onto a disk.

ZFS uses datasets much like a traditional filesystem might use partitions. Need a policy for /usr and a separate policy for /home? Make each a dataset. Need a block device for an iSCSI target? That’s a dataset. Want a copy of a dataset? That’s another dataset.

Datasets have a hierarchical relationship. A single storage pool is the parent of each top-level dataset. Each dataset can have child datasets. Datasets inherit many characteristics from their parent, as we’ll see throughout this chapter.

You’ll perform all dataset operations with the zfs(8) command. This command has all sorts of sub-commands.

Dataset Types

ZFS currently has five types of datasets: filesystems, volumes, snapshots, clones, and bookmarks.

A filesystem dataset resembles a traditional filesystem. It stores files and directories. A ZFS filesystem has a mount point and supports traditional filesystem characteristics like read-only, restricting setuid binaries, and more. Filesystem datasets also hold other information, including permissions, timestamps for file creation and modification, NFSv4 Access Control Flags, chflags(2), and the like.

A ZFS volume, or zvol, is a block device. In an ordinary filesystem, you might create a file-backed filesystem for iSCSI or a special-purpose UFS partition. On ZFS, these block devices bypass all the overhead of files and directories and reside directly on the underlying pool. Zvols get a device node, skipping the FreeBSD memory devices used to mount disk images.

A snapshot is a read-only copy of a dataset from a specific point in time. Snapshots let you retain previous versions of your filesystem and the files therein for later use. Snapshots use an amount of space based on the difference between the current filesystem and what’s in the snapshot.

A clone is a new dataset based on a snapshot of an existing dataset, allowing you to fork a filesystem. You get an extra copy of everything in the dataset. You might clone the dataset containing your production web site, giving you a copy of the site that you can hack on without touching the production site. A clone only consumes space to store the differences from the original snapshot it was created from. Chapter 7 covers snapshots, clones, and bookmarks.

Why Do I Want Datasets?

You obviously need datasets. Putting files on the disk requires a filesystem dataset. And you probably want a dataset for each traditional Unix partition, like /usr and /var. But with ZFS, you want a lot of datasets. Lots and lots and lots of datasets. This would be cruel madness with a traditional filesystem, with its hard-coded limits on the number of partitions and the inflexibility of those partitions. But using many datasets increases the control you have over your data.

Each ZFS dataset has a series of properties that control its operation, allowing the administrator to control how the dataset performs and how carefully it protects its data. You can tune each dataset exactly as you can with a traditional filesystem. Dataset properties work much like pool properties.

The sysadmin can delegate control over individual datasets to another user, allow the user to manage it without root privileges. If your organization has a whole bunch of project teams, you can give each project manager their own chunk of space and say, “Here, arrange it however you want.” Anything that reduces our workload is a good thing.

Many ZFS features, such as replication and snapshots, operate on a per-dataset basis. Separating your data into logical groups makes it easier to use these ZFS features to support your organization.

Take the example of a web server with dozens of sites, each maintained by different teams. Some teams are responsible for multiple sites, while others have only one. Some people belong to multiple teams. If you follow the traditional filesystem model, you might create a /webserver dataset, put everything in it, and control access with group permissions and sudo(8). You’ve lived like this for decades, and it works, so why change?

But create a dataset for each team, and give each site its own dataset within that parent dataset, and possibilities multiply.

A team needs a copy of a web site for testing? Clone it. With traditional filesystems, you’d have to copy the whole site directory, doubling the amount of disk needed for the site and taking much, much longer. A clone uses only the amount of space for the differences between the sites and appears instantaneously.

The team is about to deploy a new version of a site, but wants a backup of the old site? Create a snapshot. This new site probably uses a whole bunch of the same files as the old one, so you’ll reduce disk space usage. Plus, when the deployment goes horribly wrong, you can restore the old version by rolling back to the snapshot.

A particular web site needs filesystem-level performance tweaks, or compression, or some locally created property? Set it for that site.

You might create a dataset for each team, and then let the teams create their own child datasets for their own sites. You can organize your datasets to fit your people, rather than organizing your people to fit your technology.

When you must change a filesystem setting (property) on all of the sites, make the change to the parent dataset and let the children inherit it.

The same benefits apply to user home directories.

You can also move datasets between machines. Your web sites overflow the web server? Send half the datasets, along with their custom settings and all their clones and snapshots, to the new server.

There is one disadvantage to using many filesystem datasets. When you move a file within a filesystem, the file is renamed. Moving files between separate filesystems requires copying the file to a new location and deleting it from the old, rather than just renaming it. Inter-dataset file copies take more time and require more free space. But that’s trivial against all the benefits ZFS gives you with multiple datasets. This problem exists on other filesystems as well, but hosts using most other filesystems have only a few partitions, making it less obvious.

Viewing Datasets

The zfs list command shows all of the datasets, and some basic information about them.

# zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 420M 17.9G 96K none mypool/ROOT 418M 17.9G 96K none mypool/ROOT/default 418M 17.9G 418M / ...

The first field shows the dataset’s name.

Under USED and REFER you find information about how much disk space the dataset uses. One downside to ZFS’ incredible flexibility and efficiency is that its interpretation of disk space usage seems somewhat surreal if you don’t understand it. Chapter 6 discusses disk space and strategies to use it.

The AVAIL column shows how much space remains free in the pool or dataset.

Finally MOUNTPOINT shows where the dataset should be mounted. That doesn’t mean that the dataset is mounted, merely that if it were to be mounted, this is where it would go. (Use zfs mount to see all mounted ZFS filesystems.)

If you give a dataset as an argument, zfs list shows only that specific dataset.

# zfs list mypool/lamb NAME USED AVAIL REFER MOUNTPOINT mypool/lamb 192K 17.9G 96K /lamb

Restrict the type of dataset shown with the -t flag and the type. You can show filesystems, volumes, or snapshots. Here we display snapshots, and only snapshots.

# zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT zroot/var/log/db@backup 0 - 10.0G -

Now that you can see filesystems, let’s make some.

Creating, Moving, and Destroying Datasets

Use the zfs create command to create any dataset. We’ll look at snapshots, clones, and bookmarks in Chapter 7, but let’s discuss filesystems and volumes now.

Creating Filesystems

Filesystems are the most common type of dataset on most systems. Everyone needs a place to store and organize files. Create a filesystem dataset by specifying the pool and the filesystem name.

# zfs create mypool/lamb

This creates a new dataset, lamb, on the ZFS pool called mypool. If the pool has a default mount point, the new dataset is mounted by default (see “Mounting ZFS Filesystems” later this chapter).

# mount | grep lamb mypool/lamb on /lamb (zfs, local, noatime, nfsv4acls)

The mount settings in parentheses are usually ZFS properties, inherited from the parent dataset. To create a child filesystem, give the full path to the parent filesystem.

# zfs create mypool/lamb/baby

The dataset inherits many of its characteristics, including its mount point, from the parent, as we’ll see in “Parent/Child Relationships” later in this chapter.

Creating Volumes

Use the -V flag and a volume size to tell zfs create that you want to create a volume. Give the full path to the volume dataset.

# zfs create -V 4G mypool/avolume

Zvols show up in a dataset list like any other dataset. You can tell zfs list to show only zvols by adding the -t volume option.

# zfs list mypool/avolume NAME USED AVAIL REFER MOUNTPOINT mypool/avolume 4.13G 17.9G 64K -

Zvols automatically reserve an amount of space equal to the size of the volume plus the ZFS metadata. This 4 GB zvol uses 4.13 GB of space.

As block devices, zvols do not have a mount point. They do get a device node under /dev/zvol, so you can access them as you would any other block device.

# ls -al /dev/zvol/mypool/avolume crw-r----- 1 root operator 0x4d Mar 27 20:22 /dev/zvol/mypool/avolume

You can run newfs(8) on this device node, copy a disk image to it, and generally use it like any other block device.

Renaming Datasets

You can rename a dataset with, oddly enough, the zfs rename command. Give the dataset’s current name as the first argument and the new location as the second.

# zfs rename db/production db/old # zfs rename db/testing db/production

Use the -f flag to forcibly rename the dataset. You cannot unmount a filesystem with processes running in it, but the -f flag gleefully forces the unmount. Any process using the dataset loses access to whatever it was using, and reacts however it will.1

Moving Datasets

You can move a dataset from part of the ZFS tree to another, making the dataset a child of its new parent. This may cause many of the dataset’s properties to change, since children inherit properties from their parent. Any properties set specifically on the dataset will not change.

Here we move a database out from under the zroot/var/db dataset, to a new parent where you have set some properties to improve fault tolerance.

# zfs rename zroot/var/db/mysql zroot/important/mysql

Note that since mount points are inherited, this will likely change the dataset’s mount point. Adding the -u flag to the rename command will cause ZFS not to immediately change the mount point, giving you time to reset the property to the intended value. Remember that if the machine is restarted, or the dataset is manually remounted, it will use its new mount point.

You can rename a snapshot, but you cannot move snapshots out of their parent dataset. Snapshots are covered in detail in Chapter 7.

Destroying Datasets

Sick of that dataset? Drag it out behind the barn and put it out of your misery with zfs destroy.

# zfs destroy db/old

If you add the -r flag, you recursively destroy all children (datasets, snapshots, etc.) of the dataset. To destroy any cloned datasets while you’re at it, use -R. Be very careful recursively destroying datasets, as you can frequently be surprised by what, exactly, is a child of a dataset.

You might use the -v and -n flags to see exactly what will happen when you destroy a dataset. The -v flag prints verbose information about what gets destroyed, while -n tells zfs(8) to perform a dry run. Between the two, they show what this command would actually destroy before you pull the trigger.

ZFS Properties

ZFS datasets have a number of settings, called properties, that control how the dataset works. While you can set a few of these only when you create the dataset, most of them are tunable while the dataset is live. ZFS also offers a number of read-only properties that provide information such as the amount of space consumed by the dataset, the compression or deduplication ratios, and the creation time of the dataset.

Each dataset inherits its properties from its parent, unless the property is specifically set on that dataset.

Viewing Properties

The zfs(8) tool can retrieve a specific property, or all properties for a dataset. Here we change the compression property to off.

# zfs set compression=off mypool zfs get compression mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb compression lz4 inherited from mypool

Under NAME we see the dataset you asked about, and PROPERTY shows the property you requested. The VALUE is what the property is set to.

The SOURCE is a little more complicated. A source of default means that this property is set to ZFS’ default. A local source means that someone deliberately set this property on this dataset. A temporary property was set when the dataset was mounted, and this property reverts to its usual value when the dataset is unmounted. An inherited property comes from a parent dataset, as discussed in “Parent/Child Relationships” later in this chapter.

Some properties have no source because the source is either irrelevant or inherently obvious. The creation property, which records the date and time the dataset was created, has no source. The value came from the system clock.

If you don’t specify a dataset name, zfs get shows the value of this property for all datasets. The special property keyword all retrieves all of a dataset’s properties.

# zfs get all mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb type filesystem - mypool/lamb creation Fri Mar 27 20:05 2015 - mypool/lamb used 192K - ...

If you use all and don’t give a dataset name, you get all the properties for all datasets. This is a lot of information. Show multiple properties by separating the property names with commas.

# zfs get quota,reservation zroot/home NAME PROPERTY VALUE SOURCE zroot/home quota none local zroot/home reservation none default

You can also view properties with zfs list and the -o modifier. This is most suited for when you want to view several properties from multiple datasets. Use the special property name to show the dataset’s name.

# zfs list -o name,quota,reservation NAME QUOTA RESERV db none none zroot none none zroot/ROOT none none zroot/ROOT/default none none ... zroot/var/log 100G 20G ...

You can also add a dataset name to see these properties in this format for that dataset.

Changing Properties

Change properties with the zfs set command. Give the property name, the new setting, and the dataset name. Here we change the compression property to off.

# zfs set compression=off mypool/lamb/baby

Confirm your change with zfs get.

# zfs get compression mypool/lamb/baby NAME PROPERTY VALUE SOURCE mypool/lamb/baby compression off local

Most properties apply only to data written after the property is changed. The compression property tells ZFS to compress data before writing it to disk. We talk about compression in Chapter 6. Disabling compression doesn’t uncompress any data written before the change was made. Similarly, enabling compression doesn’t magically compress data already on the disk. To get the full benefit of enabling compression, you must rewrite every file. You’re better off creating a new dataset, copying the data over with zfs send, and destroying the original dataset.

Read-Only Properties

ZFS uses read-only properties to offer basic information about the dataset. Disk space usage is expressed as properties. You can’t change how much data you’re using by changing the property that says “your disk is half-full.” (Chapter 6 covers ZFS disk space usage.) The creation property records when this dataset was created. You can change many read-only properties by adding or removing data to the disk, but you can’t write these properties directly.

Filesystem Properties

One key tool for managing the performance and behavior of traditional filesystems is mount options. You can mount traditional filesystems read-only, or use the noexec flag to disable running programs from them. ZFS uses properties to achieve the same effects. Here are the properties used to accomplish these familiar goals.

atime

A file’s atime indicates when the file was last accessed. ZFS’ atime property controls whether the dataset tracks access times. The default value, on, updates the file’s atime metadata every time the file is accessed. Using atime means writing to the disk every time it’s read.

Turning this property off avoids writing to the disk when you read a file, and can result in significant performance gains. It might confuse mailers and other similar utilities that depend on being able to determine when a file was last read.

Leaving atime on increases snapshot size. The first time a file is accessed, its atime is updated. The snapshot retains the original access time, while the live filesystem contains the newly updated accessed time. This is the default.

exec

The exec property determines if anyone can run binaries and commands on this filesystem. The default is on, which permits execution. Some environments don’t permit users to execute programs from their personal or temporary directories. Set the exec property to off to disable execution of programs on the filesystem.

The exec property doesn’t prohibit people from running interpreted scripts, however. If a user can run /bin/sh, they can run /bin/sh /home/mydir/script.sh. The shell is what’s actually executing—it only takes instructions from the script.

readonly

If you don’t want anything writing to this dataset, set the readonly property to on. The default, off, lets users modify the dataset within administrative permissions.

setuid

Many people consider setuid programs risky.2 While some setuid programs must be setuid, such as passwd(1) and login(1), there’s rarely a need to have setuid programs on filesystems like /home and /tmp. Many sysadmins disallow setuid programs except on specific filesystems.

ZFS’ setuid property toggles setuid support. If set to on, the filesystem supports setuid. If set to off, the setuid flag is ignored.

User-Defined Properties

ZFS properties are great, and you can’t get enough of them, right? Well, start adding your own. The ability to store your own metadata along with your datasets lets you develop whole new realms of automation. The fact that children automatically inherit these properties makes life even easier.

To make sure your custom properties remain yours, and don’t conflict with other people’s custom properties, create a namespace. Most people prefix their custom properties with an organizational identifier and a colon. For example, FreeBSD-specific properties have the format “org.freebsd:propertyname,” such as org.freebsd:swap. If the illumos project creates its own property named swap, they’d call it org.illumos:swap. The two values won’t collide.

For example, suppose Jude wants to control which datasets get backed up via a dataset property. He creates the namespace com.allanjude.3 Within that namespace, he creates the property backup_ignore.

# zfs set com.allanjude:backup_ignore=on mypool/lamb

Jude’s backup script checks the value of this property. If it’s set to true, the backup process skips this dataset.

Parent/Child Relationships

Datasets inherit properties from their parent datasets. When you set a property on a dataset, that property applies to that dataset and all of its children. For convenience, you can run zfs(8) commands on a dataset and all of its children by adding the -r flag. Here, we query the compression property on a dataset and all of its children.

# zfs get -r compression mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb compression lz4 inherited from mypool mypool/lamb/baby compression off local

Look at the source values. The first dataset, mypool/lamb, inherited this property from the parent pool. In the second dataset, this property has a different value. The source is local, meaning that the property was set specifically on this dataset.

We can restore the original setting with the zfs inherit command.

# zfs inherit compression mypool/lamb/baby # zfs get -r compression mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb compression lz4 inherited from mypool mypool/lamb/baby compression lz4 inherited from mypool

The child now inherits the compression properties from the parent, which inherits from the grandparent.

When you change a parent’s properties, the new properties automatically propagate down to the child.

# zfs set compression=gzip-9 mypool/lamb # zfs get -r compression mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb compression gzip-9 local mypool/lamb/baby compression gzip-9 inherited from mypool/lamb

I told the parent dataset to use gzip-9 compression. That percolated down to the child.

Inheritance and Renaming

When you move or rename a dataset so that it has a new parent, the parent’s properties automatically propagate down to the child. Locally set properties remain unchanged, but inherited ones switch to those from the new parent.

Here we create a new parent dataset and check its compression property.

# zfs create mypool/second # zfs get compress mypool/second NAME PROPERTY VALUE SOURCE mypool/second compression lz4 inherited from mypool

Our baby dataset uses gzip-9 compression. It’s inherited this property from mypool/lamb. Now let’s move baby to be a child of second, and see what happens to the compression property.

# zfs rename mypool/lamb/baby mypool/second/baby # zfs get -r compression mypool/second NAME PROPERTY VALUE SOURCE mypool/second compression lz4 inherited from mypool mypool/second/baby compression lz4 inherited from mypool

The child dataset now belongs to a different parent, and inherits its properties from the new parent. The child keeps any local properties.

Data on the baby dataset is a bit of a tangle, however. Data written before compression was turned on is uncompressed. Data written while the dataset used gzip-9 compression is compressed with gzip-9. Any data written now will be compressed with lz4. ZFS sorts all this out for you automatically, but thinking about it does make one's head hurt.

Removing Properties

While you can set a property back to its default value, it’s not obvious how to change the source back to inherit or default, or how to remove custom properties once they’re set.

To remove a custom property, inherit it.

# zfs inherit com.allanjude:backup_ignore mypool/lamb

This works even if you set the property on the root dataset.

To reset a property to its default value on a dataset and all its children, or totally remove custom properties, use the zfs inherit command on the pool’s root dataset.

# zfs inherit -r compression mypool

It’s counterintuitive, but it knocks the custom setting off of the root dataset.

Mounting ZFS Filesystems

With traditional filesystems you listed each partition, its type, and where it should be mounted in /etc/fstab. You even listed temporary mounts such as floppies and CD-ROM drives, just for convenience. ZFS allows you to create such a large number of filesystems that this quickly grows impractical.

Each ZFS filesystem has a mountpoint property that defines where it should be mounted. The default mountpoint is built from the pool’s mountpoint. If a pool doesn’t have a mount point, you must assign a mount point to any datasets you want to mount.

# zfs get mountpoint zroot/usr/home NAME PROPERTY VALUE SOURCE zroot/usr/home mountpoint /usr/home inherited from zroot/usr

The filesystem normally get mounted at /usr/home. You could override this when manually mounting the filesystem.

The zroot pool used for a default FreeBSD install doesn’t have a mount point set. If you create new datasets directly under zroot, they won’t have a mount point. Datasets created on zroot under, say, /usr, inherit a mount point from their parent dataset.

Any pool other than the pool with the root filesystem normally has a mount point named after the pool. If you create a pool named db, it gets mounted at /db. All children inherit their mount point from that pool unless you change them.

When you change the mountpoint property for a filesystem, the filesystem and any children that inherit the mount point are unmounted. If the new value is legacy, then they remain unmounted. Otherwise, they are automatically remounted in the new location if the property was previously legacy or none, or if they were mounted before the property was changed. In addition, any shared filesystems are unshared and shared in the new location.

Just like ordinary filesystems, ZFS filesystems aren’t necessarily mounted. The canmount property controls a filesystem’s mount behavior. If canmount is set to yes, running zfs mount -a mounts the filesystem, just like mount -a. When you enable ZFS in /etc/rc.conf, FreeBSD runs zfs mount -a at startup.

When the canmount property is set to noauto, a dataset can only be mounted and unmounted explicitly. The dataset is not mounted automatically when the dataset is created or imported, nor is it mounted by the zfs mount -a command or unmounted by zfs unmount -a.

Things can get interesting when you set canmount to off. You might have two non-mountable datasets with the same mount point. A dataset can exist solely for the purpose of being the parent to future datasets, but not actually store files, as we’ll see below. C

hild datasets do not inherit the canmount property.

Changing the canmount property does not automatically unmount or mount the filesystem. If you disable mounting on a mounted filesystem, you’ll need to manually unmount the filesystem or reboot.

Datasets without Mount Points

ZFS datasets are hierarchical. You might need to create a dataset that will never contain any files only so it can be the common parent of a number of other datasets. Consider a default install of FreeBSD 10.1 or newer.

# zfs mount zroot/ROOT/default / zroot/tmp /tmp zroot/usr/home /usr/home zroot/usr/ports /usr/ports zroot/usr/src /usr/src ...

We have all sorts of datasets under /usr, but there’s no /usr dataset mounted. What’s going on?

A zfs list shows that a dataset exists, and it has a mount point of /usr. But let’s check the mountpoint and canmount properties of zroot/usr and all its children.

# zfs list -o name,canmount,mountpoint -r zroot/usr NAME CANMOUNT MOUNTPOINT zroot/usr off /usr zroot/usr/home on /usr/home zroot/usr/ports on /usr/ports zroot/usr/src on /usr/src

With canmount set to off, the zroot/usr dataset is never mounted. Any files written in /usr, such as the commands in /usr/bin and the packages in /usr/local, go into the root filesystem. Lower-level mount points such as /usr/src have their own datasets, which are mounted.

The dataset exists only to be a parent to the child datasets. You’ll see something similar with the /var partitions.

Multiple Datasets with the Same Mount Point

Setting canmount to off allows datasets to be used solely as a mechanism to inherit properties. One reason to set canmount to off is to have two datasets with the same mount point, so that the children of both datasets appear in the same directory, but might have different inherited characteristics.

FreeBSD’s installer does not have a mountpoint on the default pool, zroot. When you create a new dataset, you must assign a mount point to it.

If you don’t want to assign a mount point to every dataset you create right under the pool, you might assign a mountpoint of / to the zroot pool and leave canmount set to off. This way, when you create a new dataset, it has a mountpoint to inherit. This is a very simple example of using multiple datasets with the same mount point.

Imagine you want an /opt directory with two sets of subdirectories. Some of these directories contain programs, and should never be written to after installation. Other directories contain data. You must lock down the ability to run programs at the filesystem level.

# zfs create db/programs # zfs create db/data

Now give both of these datasets the mountpoint of /opt and tell them that they cannot be mounted.

# zfs set canmount=off db/programs # zfs set mountpoint=/opt db/programs