Linux ZFS Notes
Been playing around with ZFS. ZoL seems to be maturing along nicely and I thought it was worth a day of playing around with it and learn the basics. I don't think I'm ready to take the plunge yet with anything serious, but it's an interesting and very likable FS/LVM. It supposably supports 256 ZiB/Zebibytes. These are my notes for Debian.
In-General advice for running ZFS
- Have 8GB ECC RAM + 1GB for every additional 1TB of storage. ^ Kind of. RAM matters more for performance than capacity, but the more the better.
ECC RAM is preferred, but no more than any other system. Matt Ahrens, one of the ZFS founders at Sun Microsystems, now one of the founding members of OpenZFS, said in a forum post (2014):
"There's nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem. If you use UFS, EXT, NTFS, btrfs, etc without ECC RAM, you are just as much at risk as if you used ZFS without ECC RAM. Actually, ZFS can mitigate this risk to some degree if you enable the unsupported ZFS_DEBUG_MODIFY flag (zfs_flags=0x10). This will checksum the data while at rest in memory, and verify it before writing to disk, thus reducing the window of vulnerability from a memory error. I would simply say: if you love your data, use ECC RAM. Additionally, use a filesystem that checksums your data, such as ZFS."
Installing ZFS and handy extras
# apt update # apt install linux-headers-$(uname -r) # apt install zfs-dkms zfsutils-linux zfs-initramfs parted ntfs-3g samba ^ Note and update to self: dkms are kernel modules, not needed for Ubuntu. zfs-initramfs will now be referred to by zfsutils-linux as well.
List available drives to use (many ways to do this)
# lsblk There are 4 types of dev labels you can use with ZFS/zpool /dev/sdX : Best for development/test pools as names are not persistent. /dev/disk/by-id/: Nice for small systems with a single disk controller, allows to mix disks without import problem. /dev/disk/by-path/: Good for large pools as name describes the PCI bus number, enclosure name and port number. /dev/disk/by-vdev/: Best for large pools, but relies on having a /etc/zfs/vdev_id.conf file properly configured for your system. ^ It can be smart to set up aliases in /etc/zfs/vdev_id.conf, e.g. shelf and slot number. e.g. # name fully qualified or base name of device link alias d1 /dev/disk/by-id/wwn-0x5000c5002de3b9ca Run "udevadm trigger" afterwards.
Converting drives to GPT if they are bigger than 2TB
# parted /dev/<drive> mklabel gpt
Creating a simple Raid-0 pool with no redundancy
# zpool create -o ashift=12 mypool sde sdf sdg sdh (ashift=12 will force 4096 byte sectors instead of default auto detect).
Replacing a drive
# zpool offline mypool DiskID (hotswap with # hdparm -Y /dev/sd* or shutdown and change drive) # zpool replace mypool DiskID NewDiskID Alternatively add new disk as hot spare: # zpool add mypool spare NewDiskID # zpool replace mypool MyDisk NewDiskID ^ After silvering new disk into pool, the old one will be taken offline.
Destroying a pool
# zpool destroy mypool
Checking the status of a pool
# zpool status mypool # zfs list # zpool status -x (Nice for script-checking general pool health)
Creating a Raid-1+0 (Raid-10) type pool
Mirror RAID-1 based. Fastest, but only 50% storage capacity: # zpool create -o ashift=12 mypool mirror sde sdf mirror sdg sdh RAIDZ based. Slower, but more capacity depending on number of drives per vdev: # zpool create -o ashift=12 mypool raidz1 sde sdf sdg raidz1 sdh sdi sdj raidz1 sdk sdl sdm raidz1 sdn sdo sdp Both these setups will stripe across vdevs and be faster than single big RAIDZ pools. RAID-0 (fastest) RAID-1 RAIDZ-1 RAIDZ-2 RAIDZ-3 (slowest) There are three different RAID-Z modes which distribute parity across the drives: RAID-Z1 (similar to RAID 5, allows one disk to fail), RAID-Z2 (similar to RAID 6, allows two disks to fail), RAID-Z3 (Also referred to as RAID 7 allows three disks to fail). Optimal number of drives per vdev for best performance: Start a single-parity RAIDZ (raidz) configuration at 3 or 5 disks (2^N+1) Start a double-parity RAIDZ (raidz2) configuration at 6 or 8 disks (2^N+2) Start a triple-parity RAIDZ (raidz3) configuration at 9 or 11 disks (2^N+3)
Adding another vdev to the pool
# zpool add -o ashift=12 mypool mirror sdg sdh
Checking pool IO activity
# zpool iostat -v
# zpool scrub mypool # zpool scrub -s mypool (if you need to stop it) # zpool clear mypool (remove any no-longer-relevant error messages) It's recommended to scrub at least once a week.
Creating a file system / mounting point / dataset
# zfs create mypool/mypool1 # mkdir /mnt/mypoolstorage (optional) # zfs set mountpoint=/mnt/mypoolstorage mypool/mypool1 (optional) # zfs list # zfs destroy mypool/mypool1 You should only store data on these type of mounting points / datasets and not directly on the pool. This is because they will have their own sets of attributes that can be set.
Check automatic mounting on startup/shutdown
# nano /etc/default/zfs ZFS_MOUNT=’YES’ ZFS_UNMOUNT=’YES’
Check all configuration values
# zpool get all mypool # zfs get all mypool/<dataset>
# zpool set autoexpand=on mypool autoexpand: Must be set before replacing the first drive in your pool. Controls automatic pool expansion when the underlying LUN is grown. Default is "off". After all drives in the pool have been replaced with larger drives, the pool will automatically grow to the new size. This setting is a boolean, with values either "on" or "off".
Snapshots, Backups and Recovery
Creating a snapshot: # zfs snapshot mypool/mypool1@mandag (dataset) # zfs snapshot mypool@mandag (entire pool) # zfs list –t all (or just snapshot) Deleting a snapshot # zfs destroy mypool/mypool1@mandag # zfs destroy mypool@mandag Restoring data from snapshots # zfs rollback mypool/mypool1@mandag # zfs rollback mypool@mandag Make a backup to an image file # zfs send mypool/mypool1@mandag > /backup/mypool-mypool1.img # zfs send mypool@mandag > /backup/mypool.img (normal) # zfs send mypool@mandag | xz > /backup/mypool.img (compressed) # zfs send mypool@mandag | xz | openssl enc –aes-256-cbc –a –salt > /backup/mypool.img (with encryption) Recover from image file # zfs receive mypool/mypool1 < /backup/mypool-mypool1.img If compressed and encrypted # openssl enc -d -aes-256-cbc -a -in /backup/mypool-mypool1.img | unxz | zfs receive mypool/mypool1 Send and receive over SSH # zfs send mypool/mypool1@mandag | ssh email@example.com "zfs receive mypool/mypool1"
Compression (set for entire pool before copying any files to the pool)
# zfs get compression mypool # zfs set compression=lz4 mypool # zfs get compressratio mypool NOTE: ^ compressratio should be 1.00x until data is actually starting to come it. The CPU overhead of using lz4 is so little that it makes little sense not using it.
Moving to another system
# zpool upgrade -v; zfs upgrade -v ^ Match systems before migration. # zpool export -f mypool ^ On the old system. # zpool import mypool ^ On the new.
Sharing via Samba
# zfs create -o casesensitivity=mixed mypool/srv (to mimic Windows CI) # zfs set sharesmb=on mypool/srv # nano /etc/samba/smb.conf [Mypool1] comment = Debian ZFS Pool read only = no locking = no path = /mypool/mypool1 guest ok = no # smbpasswd –a <new_samba_user> # service samba reload Remember to set suitable user rights on shared folders.