The core concept of NixOS is to maintain the hosts' programs and configuration in a stateless manner.
Hand in hand with that goes the need to (but also the advantages of) identifying anything stateful in the system.
Such state generally falls into one of four categories:
1) It is secret and/or can not be re-generated (without other information of the same kind).
2) It may or may not be secret, but re-generating it takes time or is impractical.
3) It may or may not be secret and can be re-generated quickly.
Category 1 data can not be included in the nix store and thus can not (directly) be derived from the system configuration.
It needs to be stored in a way that the host (and only the host) can access it an that it can't be lost.
It is therefore referred to as `remote` data (even though it would usually also be stored locally).
Category 2 data is required for the system to boot (in reasonable time), and should thus, as `local` data, be stored persistently on the host.
Parts that are not secret should be included in or generated from the system configuration.
Anything that is secret likely shouldn't be lost (i.e. is actually category 1) or can be re-generated from randomness or category 1 (and is thus category 3).
Category 3 data is expendable, and, as `temp` data, can thus be cleared on reboot or at other times.
Temporary data is the cheapest to (not) maintain, esp. in terms of administrative overhead, and should be used wherever possible.
Though incorrectly assigning data to `temp` that should be `local` or `remote` may break the system or cause data loss.
TempRoot is the concept of defaulting everything to `temp` and selectively whitelisting things as `local` or `remote`, by mounting an ephemeral root (`/`) file system and mounting/binding/linking various, sometimes many, nested paths to persistent storage.
This module implements the concept with different filesystem options for `remote`, `local` and `temp` data, while maintaining the same general mount structure regardless, which should make the choice of backing storage largely transparent for anything running on the system.
ZFS is capable of serving all three roles quite well.
Its pooling and datasets allow seamless allocation of storage between the categories, file-system level encryption and sending of encrypted, incremental dataset snapshots make it excellent for the `remote` role.
As backed for `local`, which primarily holds the nix store, it benefits from transparent compression, and checksumming.
With `fsync` disabled, and the ability to roll back snapshots, it also works to create very large storage areas for `temp` data.
ZFS though struggles on lower-end systems. BtrFS could probably be configured to serve the roles with similar capability.
F2FS also supports checksumming and compression, though it currently does not automatically reclaim space gained by the latter (and "manually" enabling it per file prevents `mmap`ing those files).
This and its design optimized for flash storage should make it an optimal backend for the `local` data, esp. on lower-end hardware.
EXT4 supports checksumming only for metadata, and does not support compression. Block device layers could in principle be used for this.
Using a sime filesystem with external backup tools is possible yet suboptimal for `remote` data, unless the system doesn't actually have any/much of it.
As long as the amount of `temp` data can be expected to stay within reasonable bounds, `tmpfs`es and swap can also be used to back the `temp` data.
The disk/partition declaration and the installer tooling refer to disks by GPT `partlabel`. They require the labels to be unique not only within a single target host, but also between the host that does the installation and the one being installed. It is therefore highly advisable (and in some places maybe implicitly expected) that the labels contain a unique host identifier, for example:
```nix
let hash = builtins.substring 0 8 (builtins.hashString "sha256" config.networking.hostName); in # ...
```
## Examples
This completely configures the disks, partitions, pool, datasets, and mounts for a ZFS `rpool` on a three-disk `raidz1` with read and write cache on an additional SSD, which also holds the boot partition and swap:
{ setup.disks.devices = lib.genAttrs ([ "primary" "raidz1" "raidz2" "raidz3" ]) (name: { size = "16G"; }); # Need more than one disk, so must declare them. When installing to a physical disk, the declared size must match the actual size (or be smaller). The »primary« disk will hold all implicitly created partitions and those not stating a »disk«.
setup.bootpart.enable = true; setup.bootpart.size = "512M"; # See »./boot.nix.md«. Creates a FAT boot partition.
# Change/set the pools storage layout (see above), then adjust the partitions disks/sizes. Declaring disks requires them to be passed to the system installer.
On a less beefy system, but also with less data to manage, `tmpfs` works fine for `tmp`, and `f2fs` promises to get more performance out of the flash/ram/cpu:
#setup.keystore.keys."luks/rpool-${hash}/0" = "random"; # Would also enable LUKS encryption of the pool, but there isn't too much point in encrypting twice.
zfs.dataset = lib.mkOption { description = "Dataset path under which to create the ${desc} »${type}« datasets."; type = lib.types.str; default = "rpool-${hash}/${type}"; };
description = "Locations ${if type == "temp" then "(in addition to »/«)" else ""} where a ${desc} filesystem should be mounted. Some are declared by default but may be removed by setting them to »null«.";
target = lib.mkOption { description = "Attribute name as the mount target path."; type = lib.types.strMatching ''^/.*[^/]$''; default = name; readOnly = true; };
options = lib.mkOption { description = "Additional mount options to set. Note that not all mount types support all options, they may be silently ignored or cause errors. »bind« supports setting »nosuid«, »nodev«, »noexec«, »noatime«, »nodiratime«, and »relatime«. »zfs« will explicitly heed »noauto«, the other options are applied but may conflict with the ones implied by the ».zfsProps«."; type = lib.types.attrsOf (lib.types.oneOf [ lib.types.bool lib.types.str lib.types.str lib.types.int ]); default = { }; };
extraFsConfig = lib.mkOption { description = "Extra config options to set on the generated »fileSystems.*« entry (unless this mount is forced to »null«)."; type = options.fileSystems.type.nestedTypes.elemType // { visible = "shallow"; }; default = { }; };
zfsProps = lib.mkOption { description = "ZFS properties to set on the dataset, if mode type is »zfs«. Note that ZFS mounts made in the initramfs don't have the correct mount options from ZFS properties, so properties that affect mount options should (also) be set as ».options«."; type = lib.types.attrsOf (lib.types.nullOr lib.types.str); default = { }; };
zfsNoSyncProps = { sync = "disabled"; logbias = "throughput"; }; # According to the documentation, »logbias« should be irrelevant without sync (i.e. no logging), but some claim setting it to »throughput« still improves performance.
"zfs": Creates a ZFS dataset for »/« and each specified mount point, as (nested) children of ».zfs.dataset«, which will have »sync« disabled. Also adds an pre-mount command that rolls back all children of that dataset to their »@empty« snapshots (which are taken right after the datasets are created).
"bind": Expects a filesystem to be mounted at »/«. Creates a hook to cre-create that filesystem on boot (TODO: implement and then enable this), and bind-mounts any additional mounts to ».bind.source+"/"+<mount.source>« (creating those paths if necessary).
The type of filesystem that holds the system's files that needs (e.g. the Nix store) or should (e.g. caches, logs) be kept across reboots, but that can be regenerated or not worth backing up:
"bind": Expects a (locally persistent) filesystem to be mounted at ».bind.target«, and bind-mounts to ».bind.source+"/"+<mount.source>« (creating those paths if necessary). ».bind.base« can be used to automatically mount different default filesystems at ».bind.target«.
"zfs": Creates a ZFS dataset for »/« and each specified mount point, as (nested) children of ».zfs.dataset«.
The type of filesystem that holds the system's files that need to be backed up (which some external mechanism should then do):
"bind": Expects a filesystem to be mounted at ».bind.target« that gets backed. Bind-mounts to ».bind.source+"/"+<mount.source>« (creating those paths if necessary).
"zfs": Creates a ZFS dataset for »/« and each specified mount point, as (nested) children of ».zfs.dataset«.
size = lib.mkOption { description = "Size of the swap partition or file to create."; type = lib.types.nullOr lib.types.str; default = null; };
encrypted = lib.mkOption { description = "Whether to encrypt the swap with a persistent key. Only relevant if ».asPartition = true«."; type = lib.types.bool; default = false; };
asPartition = lib.mkOption { description = "Whether to create a swap partition instead of a file."; type = lib.types.bool; default = cfg.local.type == "zfs"; };
};
persistenceFixes = (lib.mkEnableOption "some fixes to cope with »/« being ephemeral") // { default = true; example = false; };
"/remote" = { source = "system"; mode = "755"; extraFsConfig = { neededForBoot = lib.mkDefault true; }; }; # if any secrets need to be picked up by »activate«, they should be here
}) (lib.mkIf (cfg.temp.type == "tmpfs") (let type = "temp"; in { # (only temp can be of type tmpfs)
# TODO: this would probably be better implemented by creating a single /.temp tmpfs with a decent size restriction, and then bind-mounting all other mount points into that pool (or at least do that for any locations that are non-root writable?)
}) (lib.mkIf (cfg.local.type == "bind" && (cfg.local.bind.base != null)) (let # Convenience option to create a local F2FS/EXT4 optimized to host the nix store:
# F2FS compresses only for performance and wear. The whole uncompressed space is still reserved (in case the file content needs to get replaced by incompressible data in-place). To free the gained space, »ioctl(fd, F2FS_IOC_RELEASE_COMPRESS_BLOCKS)« needs to be called per file, making the file immutable. Nix could do that when moving stuff into the store.
compress_mode = "fs"; # enable compression for all files
compress_algorithm = "lz4"; # compress using lz4
compress_chksum = true; # verify checksums (when decompressing data blocks?)
# TODO: "F2FS and its tools support various parameters not only for configuring on-disk layout, but also for selecting allocation and cleaning algorithms."
boot.initrd.kernelModules = lib.mkIf (config.fileSystems?${cfg.local.bind.source}) [ config.fileSystems.${cfg.local.bind.source}.fsType ]; # This is not generally, but sometimes, required to boot. Strange. (Kernel message: »request_module fs-f2fs succeeded, but still no fs?«)
systemd.tmpfiles.rules = [ (lib.fun.mkTmpfile { type = "L+"; path = "/remote"; argument = "/local"; }) ]; # for compatibility (but use a symlink to make clear that this is not actually a separate mount)
zfs.pools.${lib.head (lib.splitString "/" dataset)} = { }; # ensure the pool exists (all properties can be adjusted)
keystore.keys."zfs/${dataset}" = lib.mkIf (type == "remote" && config.${setup}.keystore.enable) (lib.mkOptionDefault "random"); # the entire point of ZFS remote are backups, and those should be encrypted