The core concept of NixOS is to maintain the hosts' programs and configuration in a stateless manner.
Hand in hand with that goes the need to (but also the advantages of) identifying anything stateful in the system.
Such state generally falls into one of four categories:
1) It is secret and/or can not be re-generated (without other information of the same kind).
2) It may or may not be secret, but re-generating it takes time or is impractical.
3) It may or may not be secret and can be re-generated quickly.
Category 1 data can not be included in the nix store and thus can not (directly) be derived from the system configuration.
It needs to be stored in a way that the host (and only the host) can access it an that it can't be lost.
It is therefore referred to as `remote` data (even though it would usually also be stored locally).
Category 2 data is required for the system to boot (in reasonable time), and should thus, as `local` data, be stored persistently on the host.
Parts that are not secret should be included in or generated from the system configuration.
Anything that is secret likely shouldn't be lost (i.e. is actually category 1) or can be re-generated from randomness or category 1 (and is thus category 3).
Category 3 data is expendable, and, as `temp` data, can thus be cleared on reboot or at other times.
Temporary data is the cheapest to (not) maintain, esp. in terms of administrative overhead, and should be used wherever possible.
Though incorrectly assigning data to `temp` that should be `local` or `remote` may break the system or cause data loss.
TempRoot is the concept of defaulting everything to `temp` and selectively whitelisting things as `local` or `remote`, by mounting an ephemeral root (`/`) file system and mounting/binding/linking various, sometimes many, nested paths to persistent storage.
This module implements the concept with different filesystem options for `remote`, `local` and `temp` data, while maintaining the same general mount structure regardless, which should make the choice of backing storage largely transparent for anything running on the system.
ZFS is capable of serving all three roles quite well.
Its pooling and datasets allow seamless allocation of storage between the categories, file-system level encryption and sending of encrypted, incremental dataset snapshots make it excellent for the `remote` role.
As backed for `local`, which primarily holds the nix store, it benefits from transparent compression, and checksumming.
With `fsync` disabled, and the ability to roll back snapshots, it also works to create very large storage areas for `temp` data.
ZFS though struggles on lower-end systems. BtrFS could probably be configured to serve the roles with similar capability.
F2FS also supports checksumming and compression, though it currently does not automatically reclaim space gained by the latter (but TODO: Nix could be tuned to do this explicitly).
This and its design optimized for flash storage should make it an optimal backend for the `local` data, esp. on lower-end hardware.
EXT4 supports checksumming only for metadata, and does not support compression. Block device layers could in principle be used for this.
Using a sime filesystem with external backup tools is possible yet suboptimal for `remote` data, unless the system doesn't actually have any/much of it.
As long as the amount of `temp` data can be expected to stay within reasonable bounds, `tmpfs`es and swap can also be used to back the `temp` data.
The disk/partition declaration and the installer tooling refer to disks by GPT `partlabel`. They require the labels to be unique not only within a single target host, but also between the host that does the installation and the one being installed. It is therefore highly advisable (and in some places maybe implicitly expected) that the labels contain a unique host identifier, for example:
```nix
let hash = builtins.substring 0 8 (builtins.hashString "sha256" config.networking.hostName); in # ...
```
## Examples
This completely configures the disks, partitions, pool, datasets, and mounts for a ZFS `rpool` on a three-disk `raidz1` with read and write cache on an additional SSD, which also holds the boot partition and swap:
{ wip.fs.disks.devices = lib.genAttrs ([ "primary" "raidz1" "raidz2" "raidz3" ]) (name: { size = "16G"; }); # Need more than one disk, so must declare them. When installing to a physical disk, the declared size must match the actual size (or be smaller). The »primary« disk will hold all implicitly created partitions and those not stating a »disk«.
wip.fs.boot.enable = true; wip.fs.boot.size = "512M"; # See »./boot.nix.md«. Creates a FAT boot partition.
wip.fs.keystore.enable = true; # See »./keystore.nix.md«. With this enabled, »remote« will automatically be encrypted, with a random key by default.
wip.fs.temproot.enable = true; # Use ZFS for all categories.
wip.fs.temproot.temp.type = "zfs";
wip.fs.temproot.local.type = "zfs";
wip.fs.temproot.remote.type = "zfs";
# Change/set the pools storage layout (see above), then adjust the partitions disks/sizes. Declaring disks requires them to be passed to the system installer.
On a less beefy system, but also with less data to manage, `tmpfs` works fine for `tmp`, and `f2fs` promises to get more performance out of the flash/ram/cpu:
bind.source = lib.mkOption { description = "Prefix for bind-mount targets."; type = lib.types.str; default = "/.${type}"; };
bind.base = lib.mkOption { description = "Filesystem to automatically create as the ».source« for the bind mounts."; type = lib.types.enum ([ null ] ++ (lib.optionals (type == "local") [ "f2fs" "f2fs-encrypted" ])); default = null; };
zfs.dataset = lib.mkOption { description = "Dataset path under which to create the ${desc} »${type}« datasets."; type = lib.types.str; default = "rpool-${hash}/${type}"; };
mounts = lib.mkOption {
description = "Locations (for »temp« in addition to »/«) where a ${desc} filesystem should be mounted. Some are declared by default but may be removed by setting them to »null«.";
target = lib.mkOption { description = "Attribute name as the mount target path."; type = lib.types.strMatching ''^/.*[^/]$''; default = name; readOnly = true; };
options = lib.mkOption { description = "Additional mount options to set. Note that not all mount types support all options, they may be silently ignored or cause errors. »bind« supports setting »nosuid«, »nodev«, »noexec«, »noatime«, »nodiratime«, and »relatime«. »zfs« will explicitly heed »noauto«, the other options are applied but may conflict with the ones implied by the ».zfsProps«."; type = lib.types.attrsOf (lib.types.either lib.types.bool lib.types.str); default = { }; };
extraFsConfig = lib.mkOption { description = "Extra config options to set on the generated »fileSystems.*« entry (unless this mount is forced to »null«)."; type = lib.types.attrsOf lib.types.anything; default = { }; };
zfsProps = lib.mkOption { description = "ZFS properties to set on the dataset, if mode type is »zfs«."; type = lib.types.attrsOf (lib.types.nullOr lib.types.str); default = { }; };
zfsNoSyncProps = { sync = "disabled"; logbias = "throughput"; }; # According to the documentation, »logbias« should be irrelevant without sync (i.e. no logging), but some claim setting it to »throughput« still improves performance.
in {
options.${prefix} = { fs.temproot = {
enable = lib.mkEnableOption "filesystem layouts with ephemeral root";
# Settings for filesystems that will be cleared on reboot:
temp = {
type = lib.mkOption { description = ''
"tmpfs": Creates »tmpfs« filesystems at »/« and all specified mount points.
"zfs": ...
"bind": Expects a filesystem to be mounted at »/«. Creates a hook to cre-create that filesystem on boot (TODO: implement and then enable this), and bind-mounts any additional mounts to ».bind.source+"/"+<mount.source>« (creating those paths if necessary).
# Settings for filesystems that persist across reboots:
local = {
type = lib.mkOption { description = ''
"bind": Expects a (locally persistent) filesystem to be mounted at ».bind.target«, and bind-mounts any additional mounts to ».bind.source+"/"+<mount.source>« (creating those paths if necessary).
# Settings for filesystems that should have remote backups:
remote = {
type = lib.mkOption { description = ''
"bind": Expects a filesystem to be mounted at ».bind.target« that gets backed. Bind-mounts any additional mounts to ».bind.source+"/"+<mount.source>« (creating those paths if necessary).
size = lib.mkOption { description = "Size of the swap partition or file to create."; type = lib.types.nullOr lib.types.str; default = null; };
encrypted = lib.mkOption { description = "Whether to encrypt the swap with a persistent key. Only relevant if ».asPartition = true«."; type = lib.types.bool; default = false; };
asPartition = lib.mkOption { description = "Whether to create a swap partition instead of a file."; type = lib.types.bool; default = cfg.local.type == "zfs"; };
};
persistenceFixes = (lib.mkEnableOption "some fixes to cope with »/« being ephemeral") // { default = true; example = false; };
"/remote" = { source = "system"; mode = "755"; extraFsConfig = { neededForBoot = true; }; }; # if any secrets need to be picked up by »activate«, they should be here
}) (lib.mkIf (cfg.temp.type == "tmpfs") (let type = "temp"; in { # (only temp can be of type tmpfs)
# TODO: this would probably be better implemented by creating a single /.temp tmpfs with a decent size restriction, and then bind-mounting all other mount points into that pool (or at least do that for any locations that are non-root writable?)
# F2FS compresses only for performance and wear. The whole uncompressed space is still reserved (in case the file content needs to get replaced by incompressible data in-place). To free the gained space, »ioctl(fd, F2FS_IOC_RELEASE_COMPRESS_BLOCKS)« needs to be called per file, making the file immutable. Nix could do that when moving stuff into the store.
# TODO: "F2FS and its tools support various parameters not only for configuring on-disk layout, but also for selecting allocation and cleaning algorithms."
boot.initrd.kernelModules = [ "f2fs" ]; # This is not generally, but sometimes, required to boot. Strange. (Kernel message: »request_module fs-f2fs succeeded, but still no fs?«)
fs.zfs.pools.${lib.head (lib.splitString "/" dataset)} = { }; # ensure the pool exists (all properties can be adjusted)
fs.keystore.keys."zfs/${dataset}" = lib.mkIf (type == "remote" && config.${prefix}.fs.keystore.enable) (lib.mkOptionDefault "random"); # the entire point of ZFS remote are backups, and those should be encrypted
fs.zfs.datasets = {
${dataset} = {
mount = false; props = { canmount = "off"; mountpoint = "/"; } // (if type == "temp" then { refreservation = "1G"; } // zfsNoSyncProps else { });