You may be aware of Nix or NixOS. Users love them for being superior tools for building, deploying, and managing software. Yet, Nix is generally perceived as notoriously hard to learn.
The core Nix ecosystem consists of several distinct components:
- Nix is a build system and package manager that comes with a configuration language to declare software components, the Nix language.
- Software made available through Nix is centrally maintained in a massive package collection called Nixpkgs.
- There also exists a Linux distribution called NixOS, which is based on Nix and Nixpkgs.
In an attempt to provide an alternative learning approach, this article discusses the Nix and its underlying principles in the context of the history of computing. The condensed findings presented here reflect only some of our ongoing community effort1, started this year to improve documentation and make the benefits of Nix more accessible to software developers, and eventually computer users in general.
Everything is a… what?
Nix is not new. It has been under active development since 2003.
While touted as the purely functional package manager, one cannot say that Nix introduces a fundamentally new paradigm. Functional programming goes back to John McCarthy’s Lisp (1962), rooted in Alonzo Church’s lambda calculus (1936), where everything is a function.
It is not even a new idea for Nix to propose parting ways with one of the most pervasive skeuomorphisms in computing, the file system, which naturally followed from an era where everything was a piece of paper.
Ken Thompson and Dennis Ritchie inherited the novelty of a hierarchical file system from one of its predecessors Multics and firmly assumed it as a given by 1974:
The most important job of UNIX is to provide a file system.
— The UNIX Time-Sharing System (1974)
Although arguably it is a severely limiting abstraction2, it remains largely unquestioned as a cornerstone of software development practice3. The rise of object oriented-programming brought about a number of experimental systems4 where everything is an object – an idea attributed to Alan Kay’s Smalltalk (1972) – but none of them saw mass adoption.
Part of the Unix philosophy later even turned into the malapropism everything is a file. Linus Torvalds clarified in various public emails that it was really about small, composable tools operating on uniform interfaces, not the specific mapping of names to contents:
The whole point with “everything is a file” is not that you have some random filename (indeed, sockets and pipes show that “file” and “filename” have nothing to do with each other), but the fact that you can use common tools to operate on different things.
— Linus Torvalds (2002)
The UNIX philosophy is often quoted as “everything is a file”, but that really means “everything is a stream of bytes”.
— Linus Torvalds (2007)
Rob Pike and Ken Thompson have further pursued the design of an entire system around a hierarchy of named files with Plan 9 since the 1980s, culminating in what Torvalds phrased as “everything is a namespace”:
It may be unnatural to the Plan-9 way of “everything is a namespace”, but that was never the UNIX way. The UNIX way is “everything is a file descriptor or a process”, but that was never about namespaces.5
— Linus Torvalds (2002)
Today’s most widely used operating systems (Linux, XNU, and Windows NT) all have file systems at their core. However, Nix is special not so much because it radically puts that into question with the purely functional approach, but for rather pragmatically offering an intriguing shift in perspective:
What if we could continue developing and using all our software (mostly) as it is, and (mostly) stop bothering with file names, paths, and directories when building and deploying it?
An intriguing shift in perspective
The key insight behind Nix is that the problem of software deployment can be seen through the lens of programming language theory. The idea was first put forward by Eelco Dolstra et al. in Imposing a Memory Management Discipline on Software Deployment (2004). In his PhD thesis The Purely Functional Software Deployment Model (2005), Dolstra proposed that we can treat the file system in an operating system like memory in a running program, and equate package management to memory management. With Nix, he showed how to apply proven solutions, such as garbage collection or disallowing arbitrary manipulation of pointers (also known as pointer discipline), to the perennial struggle of making software work reliably.
The following figure illustrates the analogy of memory structures in programs and operating systems down to single objects.
As an example of this equivalence, take the following shell script:
It contains a reference to /usr/bin/sh. That file path is just like a mutable pointer to a mutable variable:
- The path itself can be changed to reference a different file or one that does not even exist.
- The contents of
/usr/bin/shcan be changed or the file deleted entirely.
This makes it hard to reason about the overall system state – the same problem as for program state in an imperative programming language.
More details are elaborated in the table below (based on Figure 3.1 in The Purely Functional Software Deployment Model, p. 55).
Programming Language Domain
Deployment Domain
memory
disk
value, object
file
address
path name
pointer dereference
file access
pointer arithmetic
string operations
dangling pointer
path to absent file
object graph
dependency graph
calling constructed object with reference to other object
runtime dependency
calling constructor with reference to other object, not stored
build-time dependency
calling constructor with reference to other object, stored
retained dependency
languages without pointer discipline (e.g. assembler)
typical Unix-style deployment
languages with enough pointer discipline to
support conservative garbage collection (e.g. C, C++)
Nix
languages with full pointer discipline (e.g. Java, Haskell)
as-yet unknown deployment style not enabled
by contemporary operating systems
This notion was further refined by Andrey Mokhov et al. in Build Systems à la Carte (2018), from a slightly different angle: distilling the essential features of build systems shows that building software can also be seen through the lens of programming language theory. It is really about applying functions to arbitrary values, which happen to be files in a file system; some of these files end up being run as processes. Again, proven solutions like memoization and self-adjusting computation offer themselves, this time, for tackling the perennial problem of long compilation times.
Both building and deploying software components as if they were values in a program’s call graph clearly shows the benefits and power of purely functional programming: ensuring correctness while allowing flexible composition and automatic optimizations.
In case of Nix, this enables reproducible builds and deployments, comfortable construction of packages and their variants from reusable building blocks, and features such as transparent binary caching.
The following table shows equivalence in terminology between build systems and programming language theory:
Nix
Bazel
Build Systems à la Carte
programming language
store object
artifact
value
value
builder
(depends on action type)
function
function
derivation
action
Task
thunk
derivation graph
action graph, build graph
Tasks
call graph
build
build
application of Build
evaluation
store
action cache
Store
heap
What Nix has been doing successfully since 2004 is encoding the place-oriented paradigm of files and processes in terms of a dataflow-oriented programming language, and hooking its evaluation results back into the operating system.
Maybe surprisingly, that programming language is not the Nix language. Rather, Nix uses what we may call the derivation language, for lack of a better term.
The derivation language is a key mechanism in Nix, but users are rarely exposed to it. The Nix language itself is merely syntactic sugar that helps us encode objects and their relations (i.e., values and functions) as expressions in the derivation language.
Programs written in the derivation language transform build inputs into build results. These programs use part of the file system as memory, and their memory objects are files. Nix calls this part of the filesystem the Nix store. To run programs written in the derivation language, we evaluate them with the build scheduler.
The following example is a most basic6 Nix language expression:
derivation {
name = "example";
builder = /bin/sh;
args = [ "-c" "echo hello > $out" ];
system = builtins.currentSystem;
}
It declares what Nix calls a derivation: a precise description of how contents of existing files are used to derive new files.
The build instructions encoded in this derivation create a file with contents hello. This does the same thing as capturing the output of the shell script example above.
The main difference is that, with Nix, repeated executions of these build instructions will always produce the same result, regardless of what happens to the original input files. In addition, changing any of the parameters of a derivation will produce a distinctly different result that cannot be mistaken for the original one. Nix achieves this by copying all input files to the Nix store, where they cannot change, and always working with these immutable copies that are identified by their content hash.7 The build result itself also gets a unique name, which is based on the hashes of all the build inputs and parameters.
A side effect of evaluating the above expression with nix-instantiate is the creation of the the following build task:
{
"/nix/store/ccdzzm0mzmavzmf8vyr6wx95ihm2lpzr-example.drv": {
"outputs": {
"out": {
"path": "/nix/store/spvfs5qfrf113ll4vhcc5lby4gqmc532-example"
}
},
"inputSrcs": ["/nix/store/wsziwdqamp7mx03mdwciyhs7z733dlik-sh"],
"inputDrvs": {},
"system": "x86_64-darwin",
"builder": "/nix/store/wsziwdqamp7mx03mdwciyhs7z733dlik-sh",
"args": ["-c", "echo hello > $out"],
"env": {
"builder": "/nix/store/wsziwdqamp7mx03mdwciyhs7z733dlik-sh",
"name": "example",
"out": "/nix/store/spvfs5qfrf113ll4vhcc5lby4gqmc532-example",
"system": "x86_64-darwin"
}
}
}
Nix calls this structure a store derivation: a build task with unambiguously specified dependencies, persisted in the Nix store.
Note how the builder is not /bin/sh any more, but a file in /nix/store, uniquely identified by the hash of its contents. The file system path outputs.out.path will be populated when the derivation is built, and would be different if we changed any parameter to derivation – or the contents of /bin/sh – before evaluatinng the Nix expression.
The unwieldy syntax and the specifics of wiring up the build execution with env and args are rather arbitrary and have historical reasons. What matters here is that this construction has properties of a dataflow programming language:
-
Dataflow oriented: Build tasks can be composed.
The build result of one can be used as build input for another. The order of operations is determined by data dependencies, and otherwise irrelevant.
-
Pure: The
builderwill always produce the same result for the same inputs.Assuming the
builderprocess is sufficiently isolated from its host system, the transformation it performs on its arguments acts like a pure function.
Following the analogy of build systems and programming languages, this diagram illustrates the derivation as pure data transformation:
Evaluating Nix language expressions only produces build tasks. Evaluating the build tasks produces build results. The Nix package manager’s command line tools in turn allow exposing build results to the Unix environment.
NixOS pushes this idea to the limit by capturing as much operating system state as possible into the realm of declarative programming.
The following diagram shows a drastic simplification of how Nix interacts with the operating system: It uses files as function inputs, and outputs are files again. On the operating system, files can be run as processes, which, in turn, operate on files. A build function also amounts to an operating system process (not depicted).
What next?
Since its inception, Nix development has been primarily occupied with imposing the abstraction of functional programming onto the messy, real world of our Unix lineage: encoding and correctly dealing with object references in the file system, ensuring purity of function application, and working around built-in assumptions behind the mechanisms of different language ecosystems and build procedures – all while keeping performance acceptable.
Despite numerous rough edges remaining due to the enormous scope of the undertaking, Nix, Nixpkgs, and NixOS have been working products for many years. Currently there is much work in progress to improve the user experience by presenting a more consistent command line interface and better error messages.
However, something much more interesting lives in the long-term. Which other results from programming language theory and mathematics will we be able to leverage to make software build quickly, work reliably, and further tame Unix?
For example, what if the derivation language was not only pure, but also functional, to use derivations as build inputs?8 What if it also had types, to describe constraints to composing packages and configurations?
Nix is begging the question: what if everything on our computers was, in fact, a computer program?
Edited 2022-11-09: Explained more precisely the effects of changing inputs and parameters of derivations.
Edited 2022-08-29: Expanded on the derivation language, added examples and explanations. Originally it was only briefly mentioned as a key mechanism underlying Nix.
Thanks to Ian Henry (@ianthehenry) for detailed feedback and specifically for pointing out that gap.
Edited 2022-09-22: Renamed nixpkgs to Nixpkgs to follow naming conventions and avoid confusion.