---
layout: post
title: "LLMs, Interpolation, and the Ossification of CLIs"
date: 2026-04-02
---

A recent [podcast with Jeremy Howard on Machine Learning Street
Talk](https://www.youtube.com/watch?v=x0Lb_DPJbOc)
crystallized a very important point about what LLMs can
("interpolate") and can't do ("creative research"):

> "You have to be so nuanced about this stuff because if you say 'they're not
> creative', it can give the wrong idea, because they can do very creative
> seeming things. But if it's like, well, can they really extrapolate outside
> the training distribution? The answer is no, they can't. But the training
> distribution is so big, and the number of ways to interpolate between them is
> so vast, we don't really know yet what the limitations of that is."

He describes how at [Answer.AI](https://www.answer.ai/),
doing novel R&D work constantly pushes him past the
boundary of the training distribution. The LLM goes from
"incredibly clever to, like, worse than stupid, like not
understanding the most basic fundamental premise." Anyone
who has used an LLM for something novel has felt this
cliff.

I don't agree that "interpolation" is all that LLMs can
do. There is some "extrapolation" too.

## In-Context Learning and CLIs

There's a [2022 paper by Garg et
al.](https://arxiv.org/abs/2208.01066) ("What Can Transformers Learn
In-Context?") that shows transformers can effectively learn to run algorithms
like least squares and gradient descent in a single forward pass. They
interpolate **both** from the training data **and the context**.

This is why the quality of tool output matters so much, because tool output
also gets into the context. When a CLI tool returns a clear error message that
explains what went wrong and how to fix it, the LLM can interpolate from that
context even if the specific failure mode was never in the training set. The
combination of non-deterministic reasoning (the LLM) with deterministic
feedback (the tool output) is remarkably powerful. So, it's not just
interpolating training data, it's also extrapolating into the actual use case.

For the most popular tools -- `git`, `npm`, `cargo` -- maybe that's already
well understood. The training data is saturated with examples, and the LLM
knows what to do. But for less common tools, the quality of the error output
becomes the difference between the LLM solving the problem and the LLM
spiraling into nonsense.

## The Headscale Problem

I've seen this repeatedly happen with
[Headscale](https://github.com/juanfont/headscale), the
open-source control server for Tailscale. If you want to
run Tailscale completely self-hosted without relying on
Tailscale's coordination servers, Headscale is great.

In [v0.26.0](https://github.com/juanfont/headscale/releases/tag/v0.26.0) (May
2025), the route management CLI was completely rewritten. The old syntax:

```bash
headscale routes list
headscale routes enable -r <route_id>
```

became:

```bash
headscale nodes list-routes
headscale nodes approve-routes --identifier <node_id> --routes <CIDR,...>
```

The `routes` subcommand was removed, and route acceptance became route
approval. The mental model changed from enabling route IDs to approving CIDR
blocks per node, and the behavior also changed: `approve-routes` replaces *all*
approved routes with whatever you pass, so if you only specify one route, you
lose the rest.

Every time I ask Claude Code to work with Headscale, it tries the old syntax. And it
tries **really hard**. It generates `headscale routes enable`, gets an error,
and then tries variations of the old syntax rather than recognizing that the
command structure has fundamentally changed, eventually it goes to the GitHub
issue and understands the problem. The old syntax is baked deep into the
training distribution, and the new syntax barely exists there yet, if at all.

## The Ossification Risk

This creates a subtle pressure toward software ossification. If every CLI
change breaks the LLM-assisted workflow for thousands of users, there's a real
incentive to never change anything. The training data becomes a form of
technical debt that the entire ecosystem inherits.

What can be done?

For once, **good error messages matter more than ever.** When the old syntax is
used, the tool should explain what changed and show the new equivalent. This
gives the LLM the context it needs to adapt. Headscale does do some of this,
but not quite enough. The message they output says `Error: unknown
command "routes" for "headscale"`, where it should say something like `Error:
command "routes" has been superseded by "nodes list-routes" and "nodes
approve-routes"` to give these eager and stubborn LLMs a chance.

Secondly, **versioned command documentation in the output** helps a lot. If
the error message, or `--help` output includes migration notes or links to
changelogs, the LLM can pick those up from the context window.

Finally, **accepting old syntax with deprecation warnings** is the gentlest
path, and maybe mandatory now. Keep the old commands working but print a
warning with the new equivalent. Do not introduce breaking changes!

[Ben Thompson](https://stratechery.com/2026/agents-over-bubbles/) argues that
the most important moat Anthropic currently has is the combination of model,
harness, and model trained for that harness. A good CLI acts as a "skill" for
the harness in real time, creating very valuable signal to course correct the
agent. The better the tools communicate, the further the LLM can extrapolate
beyond its training data, and the less tokens they use. As LLMs become a
primary interface to CLIs, the quality of that communication becomes a
first-class design concern.

## Other people saying similar things

- [I Improved 15 LLMs at Coding in One Afternoon. Only the Harness
Changed](https://blog.can.ac/2026/02/12/the-harness-problem/): It's better to
think of "the AI" as the whole cybernetic system of feedback loops joining the
LLM and its harness, because the harness can make as much of a difference as
improvements to the model itself.
- [The Unreasonable Effectiveness of an LLM Agent Loop with Tool
Use](https://sketch.dev/blog/agent-loop): On how astonishingly well a loop with
an LLM that can call tools works for all kinds of tasks.
- [A strong commitment to backwards compatibility means keeping your
mistakes](https://utcc.utoronto.ca/~cks/space/blog/tech/MistakesAndBackwardCompatibility):
The eternal tension between stability and the ability to fix design errors.
