챕터 4. ZFS 데이터셋 (ZFS Datasets)

With ordinary filesystems you create partitions to separate different types of data, apply different optimizations to them, and limit how much of your space the partition can consume. Each partition receives a specific amount of space from the disk. We’ve all been there. We make our best guesses at how much disk space each partition on this system will need next month, next year, and five years from now. Fast forward to the future, and the amount of space you decided to give each partition is more than likely wrong. A partition without enough space for all its data sends you adding disks or moving data, complicating system management. When a partition has too much space, you kick yourself and use it as a dumping ground for stuff you’d rather have elsewhere. More than one of Lucas’ UFS2 systems has /usr/ports as a symlink to somewhere in /home. Jude usually ends up with some part of /var living in /usr/local/var.

ZFS solves this problem by pooling free space, giving your partitions flexibility impossible with more common filesystems. Each ZFS dataset you create consumes only the space required to store the files within it. Each dataset has access to all of the free space in the pool, eliminating your worries about the size of your partitions. You can limit the size of a dataset with a quota or guarantee it a minimum amount of space with a reservation, as discussed in Chapter 6.

Regular filesystems use the separate partitions to establish different policies and optimizations for the different types of data. /var contains often-changing files like logs and databases. The root filesystem needs consistency and safety over performance. Over in /home, anything goes. Once you establish a policy for a traditional filesystem, though, it’s really hard to change. The tunefs(8) utility for UFS requires the filesystem be unmounted to make changes. Some characteristics, such as the number of inodes, just cannot be changed after the filesystem has been created.

The core problem of traditional filesystems distills to inflexibility. ZFS datasets are almost infinitely flexible.

Datasets

A dataset is a named chunk of data. This data might resemble a traditional filesystem, with files, directories, and permissions and all that fun stuff. It could be a raw block device, or a copy of other data, or anything you can cram onto a disk.

ZFS uses datasets much like a traditional filesystem might use partitions. Need a policy for /usr and a separate policy for /home? Make each a dataset. Need a block device for an iSCSI target? That’s a dataset. Want a copy of a dataset? That’s another dataset.

Datasets have a hierarchical relationship. A single storage pool is the parent of each top-level dataset. Each dataset can have child datasets. Datasets inherit many characteristics from their parent, as we’ll see throughout this chapter.

You’ll perform all dataset operations with the zfs(8) command. This command has all sorts of sub-commands.

Dataset Types

ZFS currently has five types of datasets: filesystems, volumes, snapshots, clones, and bookmarks.

A filesystem dataset resembles a traditional filesystem. It stores files and directories. A ZFS filesystem has a mount point and supports traditional filesystem characteristics like read-only, restricting setuid binaries, and more. Filesystem datasets also hold other information, including permissions, timestamps for file creation and modification, NFSv4 Access Control Flags, chflags(2), and the like.

A ZFS volume, or zvol, is a block device. In an ordinary filesystem, you might create a file-backed filesystem for iSCSI or a special-purpose UFS partition. On ZFS, these block devices bypass all the overhead of files and directories and reside directly on the underlying pool. Zvols get a device node, skipping the FreeBSD memory devices used to mount disk images.

A snapshot is a read-only copy of a dataset from a specific point in time. Snapshots let you retain previous versions of your filesystem and the files therein for later use. Snapshots use an amount of space based on the difference between the current filesystem and what’s in the snapshot.

A clone is a new dataset based on a snapshot of an existing dataset, allowing you to fork a filesystem. You get an extra copy of everything in the dataset. You might clone the dataset containing your production web site, giving you a copy of the site that you can hack on without touching the production site. A clone only consumes space to store the differences from the original snapshot it was created from. Chapter 7 covers snapshots, clones, and bookmarks.

Why Do I Want Datasets?

You obviously need datasets. Putting files on the disk requires a filesystem dataset. And you probably want a dataset for each traditional Unix partition, like /usr and /var. But with ZFS, you want a lot of datasets. Lots and lots and lots of datasets. This would be cruel madness with a traditional filesystem, with its hard-coded limits on the number of partitions and the inflexibility of those partitions. But using many datasets increases the control you have over your data.

Each ZFS dataset has a series of properties that control its operation, allowing the administrator to control how the dataset performs and how carefully it protects its data. You can tune each dataset exactly as you can with a traditional filesystem. Dataset properties work much like pool properties.

The sysadmin can delegate control over individual datasets to another user, allow the user to manage it without root privileges. If your organization has a whole bunch of project teams, you can give each project manager their own chunk of space and say, “Here, arrange it however you want.” Anything that reduces our workload is a good thing.

Many ZFS features, such as replication and snapshots, operate on a per-dataset basis. Separating your data into logical groups makes it easier to use these ZFS features to support your organization.

Take the example of a web server with dozens of sites, each maintained by different teams. Some teams are responsible for multiple sites, while others have only one. Some people belong to multiple teams. If you follow the traditional filesystem model, you might create a /webserver dataset, put everything in it, and control access with group permissions and sudo(8). You’ve lived like this for decades, and it works, so why change?

But create a dataset for each team, and give each site its own dataset within that parent dataset, and possibilities multiply.

A team needs a copy of a web site for testing? Clone it. With traditional filesystems, you’d have to copy the whole site directory, doubling the amount of disk needed for the site and taking much, much longer. A clone uses only the amount of space for the differences between the sites and appears instantaneously.

The team is about to deploy a new version of a site, but wants a backup of the old site? Create a snapshot. This new site probably uses a whole bunch of the same files as the old one, so you’ll reduce disk space usage. Plus, when the deployment goes horribly wrong, you can restore the old version by rolling back to the snapshot.

A particular web site needs filesystem-level performance tweaks, or compression, or some locally created property? Set it for that site.

You might create a dataset for each team, and then let the teams create their own child datasets for their own sites. You can organize your datasets to fit your people, rather than organizing your people to fit your technology.

When you must change a filesystem setting (property) on all of the sites, make the change to the parent dataset and let the children inherit it.

The same benefits apply to user home directories.

You can also move datasets between machines. Your web sites overflow the web server? Send half the datasets, along with their custom settings and all their clones and snapshots, to the new server.

There is one disadvantage to using many filesystem datasets. When you move a file within a filesystem, the file is renamed. Moving files between separate filesystems requires copying the file to a new location and deleting it from the old, rather than just renaming it. Inter-dataset file copies take more time and require more free space. But that’s trivial against all the benefits ZFS gives you with multiple datasets. This problem exists on other filesystems as well, but hosts using most other filesystems have only a few partitions, making it less obvious.

Viewing Datasets

The zfs list command shows all of the datasets, and some basic information about them.

# zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 420M 17.9G 96K none mypool/ROOT 418M 17.9G 96K none mypool/ROOT/default 418M 17.9G 418M / ...

The first field shows the dataset’s name.

Under USED and REFER you find information about how much disk space the dataset uses. One downside to ZFS’ incredible flexibility and efficiency is that its interpretation of disk space usage seems somewhat surreal if you don’t understand it. Chapter 6 discusses disk space and strategies to use it.

The AVAIL column shows how much space remains free in the pool or dataset.

Finally MOUNTPOINT shows where the dataset should be mounted. That doesn’t mean that the dataset is mounted, merely that if it were to be mounted, this is where it would go. (Use zfs mount to see all mounted ZFS filesystems.)

If you give a dataset as an argument, zfs list shows only that specific dataset.

# zfs list mypool/lamb NAME USED AVAIL REFER MOUNTPOINT mypool/lamb 192K 17.9G 96K /lamb

Restrict the type of dataset shown with the -t flag and the type. You can show filesystems, volumes, or snapshots. Here we display snapshots, and only snapshots.

# zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT zroot/var/log/db@backup 0 - 10.0G -

Now that you can see filesystems, let’s make some.

Creating, Moving, and Destroying Datasets

Use the zfs create command to create any dataset. We’ll look at snapshots, clones, and bookmarks in Chapter 7, but let’s discuss filesystems and volumes now.

Creating Filesystems

Filesystems are the most common type of dataset on most systems. Everyone needs a place to store and organize files. Create a filesystem dataset by specifying the pool and the filesystem name.

# zfs create mypool/lamb

This creates a new dataset, lamb, on the ZFS pool called mypool. If the pool has a default mount point, the new dataset is mounted by default (see “Mounting ZFS Filesystems” later this chapter).

# mount | grep lamb mypool/lamb on /lamb (zfs, local, noatime, nfsv4acls)

The mount settings in parentheses are usually ZFS properties, inherited from the parent dataset. To create a child filesystem, give the full path to the parent filesystem.

# zfs create mypool/lamb/baby

The dataset inherits many of its characteristics, including its mount point, from the parent, as we’ll see in “Parent/Child Relationships” later in this chapter.

Creating Volumes

Use the -V flag and a volume size to tell zfs create that you want to create a volume. Give the full path to the volume dataset.

# zfs create -V 4G mypool/avolume

Zvols show up in a dataset list like any other dataset. You can tell zfs list to show only zvols by adding the -t volume option.

# zfs list mypool/avolume NAME USED AVAIL REFER MOUNTPOINT mypool/avolume 4.13G 17.9G 64K -

Zvols automatically reserve an amount of space equal to the size of the volume plus the ZFS metadata. This 4 GB zvol uses 4.13 GB of space.

As block devices, zvols do not have a mount point. They do get a device node under /dev/zvol, so you can access them as you would any other block device.

# ls -al /dev/zvol/mypool/avolume crw-r----- 1 root operator 0x4d Mar 27 20:22 /dev/zvol/mypool/avolume

You can run newfs(8) on this device node, copy a disk image to it, and generally use it like any other block device.

Renaming Datasets

You can rename a dataset with, oddly enough, the zfs rename command. Give the dataset’s current name as the first argument and the new location as the second.

# zfs rename db/production db/old # zfs rename db/testing db/production

Use the -f flag to forcibly rename the dataset. You cannot unmount a filesystem with processes running in it, but the -f flag gleefully forces the unmount. Any process using the dataset loses access to whatever it was using, and reacts however it will.1

Moving Datasets

You can move a dataset from part of the ZFS tree to another, making the dataset a child of its new parent. This may cause many of the dataset’s properties to change, since children inherit properties from their parent. Any properties set specifically on the dataset will not change.

Here we move a database out from under the zroot/var/db dataset, to a new parent where you have set some properties to improve fault tolerance.

# zfs rename zroot/var/db/mysql zroot/important/mysql

Note that since mount points are inherited, this will likely change the dataset’s mount point. Adding the -u flag to the rename command will cause ZFS not to immediately change the mount point, giving you time to reset the property to the intended value. Remember that if the machine is restarted, or the dataset is manually remounted, it will use its new mount point.

You can rename a snapshot, but you cannot move snapshots out of their parent dataset. Snapshots are covered in detail in Chapter 7.

Destroying Datasets

Sick of that dataset? Drag it out behind the barn and put it out of your misery with zfs destroy.

# zfs destroy db/old

If you add the -r flag, you recursively destroy all children (datasets, snapshots, etc.) of the dataset. To destroy any cloned datasets while you’re at it, use -R. Be very careful recursively destroying datasets, as you can frequently be surprised by what, exactly, is a child of a dataset.

You might use the -v and -n flags to see exactly what will happen when you destroy a dataset. The -v flag prints verbose information about what gets destroyed, while -n tells zfs(8) to perform a dry run. Between the two, they show what this command would actually destroy before you pull the trigger.

ZFS Properties

ZFS datasets have a number of settings, called properties, that control how the dataset works. While you can set a few of these only when you create the dataset, most of them are tunable while the dataset is live. ZFS also offers a number of read-only properties that provide information such as the amount of space consumed by the dataset, the compression or deduplication ratios, and the creation time of the dataset.

Each dataset inherits its properties from its parent, unless the property is specifically set on that dataset.

Viewing Properties

The zfs(8) tool can retrieve a specific property, or all properties for a dataset. Here we change the compression property to off.

# zfs set compression=off mypool zfs get compression mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb compression lz4 inherited from mypool

Under NAME we see the dataset you asked about, and PROPERTY shows the property you requested. The VALUE is what the property is set to.

The SOURCE is a little more complicated. A source of default means that this property is set to ZFS’ default. A local source means that someone deliberately set this property on this dataset. A temporary property was set when the dataset was mounted, and this property reverts to its usual value when the dataset is unmounted. An inherited property comes from a parent dataset, as discussed in “Parent/Child Relationships” later in this chapter.

Some properties have no source because the source is either irrelevant or inherently obvious. The creation property, which records the date and time the dataset was created, has no source. The value came from the system clock.

If you don’t specify a dataset name, zfs get shows the value of this property for all datasets. The special property keyword all retrieves all of a dataset’s properties.

# zfs get all mypool/lamb NAME PROPERTY VALUE SOURCE mypool/lamb type filesystem - mypool/lamb creation Fri Mar 27 20:05 2015 - mypool/lamb used 192K - ...

If you use all and don’t give a dataset name, you get all the properties for all datasets. This is a lot of information. Show multiple properties by separating the property names with commas.

# zfs get quota,reservation zroot/home NAME PROPERTY VALUE SOURCE zroot/home quota none local zroot/home reservation none default

You can also view properties with zfs list and the -o modifier. This is most suited for when you want to view several properties from multiple datasets. Use the special property name to show the dataset’s name.

# zfs list -o name,quota,reservation NAME QUOTA RESERV db none none zroot none none zroot/ROOT none none zroot/ROOT/default none none ... zroot/var/log 100G 20G ...

You can also add a dataset name to see these properties in this format for that dataset.

Changing Properties

Change properties with the zfs set command. Give the property name, the new setting, and the dataset name. Here we change the compression property to off.

# zfs set compression=off mypool/lamb/baby

Confirm your change with zfs get.

# zfs get compression mypool/lamb/baby NAME PROPERTY VALUE SOURCE mypool/lamb/baby compression off local

Most properties apply only to data written after the property is changed. The compression property tells ZFS to compress data before writing it to disk. We talk about compression in Chapter 6. Disabling compression doesn’t uncompress any data written before the change was made. Similarly, enabling compression doesn’t magically compress data already on the disk. To get the full benefit of enabling compression, you must rewrite every file. You’re better off creating a new dataset, copying the data over with zfs send, and destroying the original dataset.

Read-Only Properties

ZFS uses read-only properties to offer basic information about the dataset. Disk space usage is expressed as properties. You can’t change how much data you’re using by changing the property that says “your disk is half-full.” (Chapter 6 covers ZFS disk space usage.) The creation property records when this dataset was created. You can change many read-only properties by adding or removing data to the disk, but you can’t write these properties directly.

Filesystem Properties

One key tool for managing the performance and behavior of traditional filesystems is mount options. You can mount traditional filesystems read-only, or use the noexec flag to disable running programs from them. ZFS uses properties to achieve the same effects. Here are the properties used to accomplish these familiar goals.

atime

A file’s atime indicates when the file was last accessed. ZFS’ atime property controls whether the dataset tracks access times. The default value, on, updates the file’s atime metadata every time the file is accessed. Using atime means writing to the disk every time it’s read.

Turning this property off avoids writing to the disk when you read a file, and can result in significant performance gains. It might confuse mailers and other similar utilities that depend on being able to determine when a file was last read.

Leaving atime on increases snapshot size. The first time a file is accessed, its atime is updated. The snapshot retains the original access time, while the live filesystem contains the newly updated accessed time. This is the default.

exec

The exec property determines if anyone can run binaries and commands on this filesystem. The default is on, which permits execution. Some environments don’t permit users to execute programs from their personal or temporary directories. Set the exec property to off to disable execution of programs on the filesystem.

The exec property doesn’t prohibit people from running interpreted scripts, however. If a user can run /bin/sh, they can run /bin/sh /home/mydir/script.sh. The shell is what’s actually executing—it only takes instructions from the script.

readonly

If you don’t want anything writing to this dataset, set the readonly property to on. The default, off, lets users modify the dataset within administrative permissions.

setuid

Many people consider setuid programs risky.2 While some setuid programs must be setuid, such as passwd(1) and login(1), there’s rarely a need to have setuid programs on filesystems like /home and /tmp. Many sysadmins disallow setuid programs except on specific filesystems.

ZFS’ setuid property toggles setuid support. If set to on, the filesystem supports setuid. If set to off, the setuid flag is ignored.

User-Defined Properties

ZFS properties are great, and you can’t get enough of them, right? Well, start adding your own. The ability to store your own metadata along with your datasets lets you develop whole new realms of automation. The fact that children automatically inherit these properties makes life even easier.