Self-adjusting computation in 100 lines of TypeScript
Self-adjusting computation is a technique for breaking up a complex, expensive-to-execute task into a graph of smaller subtasks which can be computed incrementally. The advantage to this is that re-running the task after making changes will be much faster than running it from scratch, because only the part of the graph that has changed, and any nodes that depend on it, will actually be re-executed. Right now, for whatever reason, there’s a sudden vogue for reactive state management in JavaScript developer circles—this topic touches directly on SAC, so I thought I’d go over how these systems can be built in practice.
This paper, miniAdapton: A Minimal Implementation of Incremental Computation in Scheme, was very helpful when trying to understand the fundamentals of SAC. It describes a working incremental system in a few functions. I don’t find Lisp syntax particularly easy to read, so I decided to rewrite it to make sure I actually understand it.
I used TypeScript, because I think a type system makes the shape of the data clearer, though as you’ll see towards the end of the post it can be a bit of a awkward fit when porting the extremely dynamic Scheme implementation.
Memoisation
The paper starts by defining SAC as essentially a variation on memoisation. Memoisation is a technique for speeding up computation by saving intermediate results that would otherwise be recomputed many times. The canonical example is calculating the fibonnaci series. Take the following recursive function:
function fibonnaci(n: number): number {
if (n <= 1) {
return n;
}
return fibonnaci(n - 1) + fibonnaci(n - 2);
}
expect(fibonnaci(10)).toBe(55);
This gets very slow, very quickly with increasing values of n
, but we can take advantage of the fact that the function recursively calls itself repeatedly with the same n
, and save those results instead of recomputing them.
function memoisedFibonnaci(n: number, intermediateResults = new Map()): number {
if (intermediateResults.has(n)) {
return intermediateResults.get(n);
}
if (n <= 1) return n;
const result =
memoisedFibonnaci(n - 1, intermediateResults) +
memoisedFibonnaci(n - 2, intermediateResults);
intermediateResults.set(n, result);
return result;
}
expect(memoisedFibonnaci(80)).toBe(23416728348467685);
We can extract this memoisation logic out into a higher-order function.
function memoise<F extends (arg: any) => any>(f: F): F {
const cache = new Map();
const memoisedFunction = (arg: Parameters<F>[0]): ReturnType<F> => {
if (cache.has(arg)) {
return cache.get(arg);
}
const result = f(arg);
cache.set(arg, result);
return result;
};
return memoisedFunction as F;
}
Using memoise
, our definition of the memoised fibonacci sequence can be rewritten to look a lot closer to the unmemoised version:
const memoisedFibonnaci = memoise((n: number): number => {
if (n <= 1) {
return n;
}
return memoisedFibonnaci(n - 1) + memoisedFibonnaci(n - 2);
});
This works for more complex data-structures. The paper uses the example of finding the maximum value in a tree:
type Tree =
| { type: "branch", left: Tree, right: Tree }
| { type: "leaf", val: number };
type Direction = "left" | "right";
function leaf(val: number): Tree {
return {
type: "leaf",
val,
};
}
function branch(left: Tree, right: Tree): Tree {
return {
type: "branch",
left,
right,
};
}
function maxTree(tree: Tree) {
if (tree.type === "branch") {
return Math.max(maxTree(tree.left), maxTree(tree.right));
}
return tree.val;
}
export function maxTreePath(tree: Tree): Direction[] {
if (tree.type === "branch") {
if (maxTree(tree.left) > maxTree(tree.right)) {
return ["left", ...maxTreePath(tree.left)];
} else {
return ["right", ...maxTreePath(tree.right)];
}
}
return [];
}
const tree = branch(branch(leaf(1), leaf(2)), branch(leaf(3), leaf(4)));
expect(maxTree(tree)).toBe(4);
expect(maxTreePath(tree)).toEqual(["right", "right"]);
And here are the memoised versions:
const memoisedMaxTree = memoise((tree: Tree): number => {
if (tree.type === "branch") {
return Math.max(memoisedMaxTree(tree.left), memoisedMaxTree(tree.right));
}
return tree.val;
});
export const memoisedMaxTreePath = memoise((tree: Tree): Direction[] => {
if (tree.type === "branch") {
if (memoisedMaxTree(tree.left) > memoisedMaxTree(tree.right)) {
return ["left", ...memoisedMaxTreePath(tree.left)];
} else {
return ["right", ...memoisedMaxTreePath(tree.right)];
}
}
return [];
});
const tree = branch(branch(leaf(1), leaf(2)), branch(leaf(3), leaf(4)));
expect(memoisedMaxTree(tree)).toBe(4);
expect(memoisedMaxTreePath(tree)).toEqual(["right", "right"]);
Repeated calls to the memoised functions with the same argument will be extremely cheap, because instead of repeating their calculation, they look up previous cached results. Note that memoisedMaxTree
and memoisedMaxTreePath
recurse by calling themselves, not the unmemoised maxTree
and maxTreePath
, so all the intermediate results are also memoised, just as with memoisedFibonnaci
. If we use previously evaluated trees as subtrees of new trees, the results will be reused.
const tree1 = branch(branch(leaf(1), leaf(2)), branch(leaf(3), leaf(4)));
expect(memoisedMaxTree(tree1)).toBe(4);
const tree2 = branch(leaf(99), leaf(1));
expect(memoisedMaxTree(tree2)).toBe(99);
const tree3 = branch(tree1, tree2);
expect(memoisedMaxTree(tree3)).toBe(99);
So that’s cool. The problem, though, is that this falls apart if we directly mutate a memoised tree, perhaps by changing the value of one of the leaf nodes.
const tree = branch(branch(leaf(1), leaf(2)), branch(leaf(3), leaf(4)));
const memoisedMaxTree = memoise(maxTree);
expect(memoisedMaxTree(tree)).toBe(4);
tree.left.left.val = 5;
expect(memoisedMaxTree(tree)).toBe(5); // Fails!
const tree = branch(branch(leaf(1), leaf(2)), branch(leaf(3), leaf(4)));
const memoisedMaxTreePath = memoise(maxTreePath);
expect(memoisedMaxTreePath(tree)).toEqual(["right", "right"]);
tree.left.left.val = 5;
expect(memoisedMaxTreePath(tree)).toEqual(["left", "left"]); // Fails!
Our memoised functions have no capacity to see that the tree has changed. They just see that they have been called with this object before, look up the (now wrong) cached results, and return them.
With memoisation, the solution to this problem is to blow away the cached result after mutating the object and recalculate it from scratch, but this means we lose essentially all of the benefits of memoisation. This is especially irritating for this kind of tree problem, where we have potentially only mutated a small part of the tree. Most of our recalculations will have the same result, so it would be good instead to be able to reuse the parts that have not changed.
Adapton
The idea behind Adapton is to create memoised calculations that are capable of reacting to mutation in this way. From the paper:
Adapton stores their results, how it got those results and a graph of the dependencies between those computations. Then, whenever a value is mutated through the Adapton interface, the computations depending on that value are marked dirty to indicate that their computation must be restarted. This lets Adapton keep many of the benefits of memoization while still permitting mutation.
We do this by breaking up our calculations up into a collection of “Adapton thunks”, or athunks. Each athunk keeps a cached result, and a collection of other athunks that it depends on. If any of those have changed, it recalculates its result and notifies its dependants that it has changed, which recalculate in turn. Applied to our tree problem, this means that only the subtrees that have actually changed are recalculated.
The paper breaks up its implementation into microAdapton
, which is the set of core functions for creating and linking athunks, and miniAdapton
, which is a slightly more ergonomic interface built on top of those core functions. We’ll start with microAdapton
.
microAdapton
We create an athunk with createAthunk()
, which takes a thunk and wraps it with the required bookkeeping:
type Athunk<T> = {
thunk: Thunk<T>;
result: T | undefined;
sub: Set<Athunk<T>>;
super_: Set<Athunk<T>>;
clean: boolean;
};
function makeAthunk<T>(thunk: Thunk<T>): Athunk<T> {
return {
thunk,
result: undefined,
sub: new Set(),
super_: new Set(),
clean: false,
};
}
I found the terminology of “Superdependencies” and “Subdependencies” in the paper a bit counter-intuitive, so to reiterate:
- If athunk a depends on the result of athunk b, b is a subdependency of a.
- If athunk a depends on the result of athunk b, a is a superdependency of b.
I spent quite a lot of time thinking it was the other way around!
This network of dependency relationships between athunks is called the Demanded Computation Graph or DCG. We say that an edge exists from athunk A to athunk B if A contains B in its subdependencies and B contains A in its superdependencies; that is, A depends on B:
function adaptonAddDcgEdge<T>(super_: Athunk<T>, sub: Athunk<T>) {
super_.sub.add(sub);
sub.super_.add(super_);
}
function adaptonDelDcgEdge<T>(super_: Athunk<T>, sub: Athunk<T>) {
super_.sub.delete(sub);
sub.super_.delete(super_);
}
We get an athunk’s value with the adaptonCompute
function, returning the cached value if it is clean, and recalculating otherwise. If we recalculate, we remove any DCG edges with the athunk’s dependencies as they are no longer valid:
function adaptonCompute<T>(adapton: Athunk<T>): T {
if (adapton.clean) {
return adapton.result!; // Result is always non-null when clean is true.
}
for (const x of adapton.sub) {
adaptonDelDcgEdge(adapton, x);
}
adapton.clean = true;
adapton.result = adapton.thunk();
return adaptonCompute(adapton);
}
We can mark an athunk (and any athunks that depend on it) as dirty with adaptonDirty()
:
function adaptonDirty<T>(adapton: Athunk<T>) {
if (adapton.clean) {
adapton.clean = false;
for (const x of adapton.super_) {
adaptonDirty(x);
}
}
}
The adapton graph can contain two basic types of structure, both made with athunks. There are mutable cells (or “refs”), which represent base values that can be changed, and there are derived values which represent calculations on values. These calculations can reference cells, or other derived values, or both. A cell is created with adaptonRef
:
function adaptonRef<T>(val: T): Athunk<T> {
const a: Athunk<T> = {
thunk: () => a.result!,
result: val,
sub: new Set(),
super_: new Set(),
clean: true,
};
return a;
}
The ref’s thunk returns its own result value, so we can update it just by changing the result
field directly. We also need to be careful to mark it dirty while doing so, which the adaptonRefSet
function does:
function adaptonRefSet<T>(adapton: Athunk<T>, val: T): void {
if (adapton.result !== val) {
adapton.result = val;
adaptonDirty(adapton);
}
}
These are all we need to get some self-adjusting computation:
// Create two mutable cells.
const r1 = adaptonRef(8);
const r2 = adaptonRef(10);
// Create a derived value
const a = makeAthunk(() => {
adaptonAddDcgEdge(a, r1);
adaptonAddDcgEdge(a, r2);
return adaptonCompute(r1) - adaptonCompute(r2);
});
expect(adaptonCompute(a)).toBe(-2);
// Mutate one of the cells
adaptonRefSet(r1, 2);
// Unlike memoisation, adapton responds to this change and
// still produces the correct answer.
expect(adaptonCompute(a)).toBe(-8);
miniAdapton
microAdapton works, but it’s a bit cumbersome to use and quite fragile. You have to remember to add DGC edges manually, and if you don’t do it correctly the system will break in confusing ways. We can make it more pleasant with a helper function, adaptonForce
, which automatically adds the appropriate edges:
let currentlyAdapting: Athunk<unknown> | null = null;
function adaptonForce<T>(adapton: Athunk<T>) {
let prevAdapting = currentlyAdapting;
currentlyAdapting = adapton;
const result = adaptonCompute(adapton);
currentlyAdapting = prevAdapting;
if (currentlyAdapting != null) {
adaptonAddDcgEdge(currentlyAdapting, adapton);
}
return result;
}
Here’s the usage:
// Mutable cells
const r1 = adaptonRef(8);
const r2 = adaptonRef(10);
// Derived value
const a = makeAthunk(() => adaptonForce(r1) + adaptonForce(r2));
expect(adaptonForce(a)).toBe(18);
adaptonRefSet(r1, 9);
expect(adaptonForce(a)).toBe(19);
adaptonRefSet(r2, 11);
expect(adaptonForce(a)).toBe(20);
There’s also some functions for combining memoisation with adapton:
function adaptonMemoiseLazy<F extends (arg: any) => any>(f: F) {
return memoise((arg: Parameters<F>[0]) => {
return makeAthunk(() => f.call(null, arg));
});
}
function adaptonMemoise<F extends (arg: any) => any>(f: F) {
const f_ = adaptonMemoiseLazy(f);
return (arg: Parameters<F>[0]) => adaptonForce(f_(arg));
}
adaptonMemoiseLazy
takes a function that computes a value, and turns it into a function that returns an athunk representing that computation. This is a thunk that produces an athunk—yes, confusing. It then memoises that thunk.
adaptonMemoise
takes a function that computes a value, calls adaptonMemoiseLazy
to produce a memoised athunk-thunk, then returns a function that calls adaptonForce
on the athunk when called. The end result is a function with the same type signature as the original function, but set up to return the result of forcing a memoised athunk. The result is, when we call the function with the same argument multiple times, we don’t rebuild the athunk, we reuse it.
The rest of the paper is devoted to convenience macros that make building up complex adapton data structures and functions easier; unfortunately JavaScript doesn’t have first-class support for macros, which makes these hard (if not impossible?) to implement. But we can use adaptonForce
and adaptonMemoise
to solve our initial tree problems. We need to slightly adjust our maxTree
and maxTreePath
functions to account for the fact that they might be dealing with athunks:
function isAdapton<T>(x: any): x is Athunk<T> {
return x != null && typeof x === "object" && x.super_ != null;
}
const adaptonMemoisedMaxTree = adaptonMemoise(
(t: Tree | Athunk<Tree>): number => {
if (isAdapton(t)) {
return adaptonMemoisedMaxTree(adaptonForce(t));
} else if (t.type === "branch") {
return Math.max(
adaptonMemoisedMaxTree(t.left),
adaptonMemoisedMaxTree(t.right)
);
} else {
return t.val;
}
}
);
const adaptonMemoisedMaxTreePath = adaptonMemoise(
(tree: Tree | Athunk<Tree>): Direction[] => {
if (isAdapton(tree)) {
return adaptonMemoisedMaxTreePath(adaptonForce(tree));
}
if (tree.type === "branch") {
if (
adaptonMemoisedMaxTree(tree.left) > adaptonMemoisedMaxTree(tree.right)
) {
return ["left", ...adaptonMemoisedMaxTreePath(tree.left)];
} else {
return ["right", ...adaptonMemoisedMaxTreePath(tree.right)];
}
}
return [];
}
);
Just as with the initial memoise
examples, when the anonymous functions wrapped in adaptonMemoise
recurse, they don’t call themselves directly. They call the memoised version of themselves (adaptonMemoisedMaxTree
, adaptonMemoisedMaxTreePath
). So each stage of the computation is memoised as a separate derived athunk value.
Finally we can build up a tree out of athunks and evaluate it:
function aLeaf(val: Athunk<number>): Athunk<Tree> {
return makeAthunk(() => {
return {
type: "leaf",
val: adaptonForce(val),
};
});
}
function aBranch(left: Athunk<Tree>, right: Athunk<Tree>): Athunk<Tree> {
return makeAthunk(() => {
return {
type: "branch",
left: adaptonForce(left),
right: adaptonForce(right),
};
});
}
const r1 = adaptonRef(1);
const r2 = adaptonRef(2);
const r3 = adaptonRef(3);
const r4 = adaptonRef(4);
const aTree = aBranch(
aBranch(aLeaf(r1), aLeaf(r2)),
aBranch(aLeaf(r3), aLeaf(r4))
);
expect(maxTreeAdaptonMemoised(aTree)).toBe(4);
expect(maxTreePathAdaptonMemoised(aTree)).toEqual(["right", "right"]);
adaptonRefSet(r1, 5);
expect(maxTreeAdaptonMemoised(aTree)).toBe(5);
expect(maxTreePathAdaptonMemoised(aTree)).toEqual(["left", "left"]);
It works! Whenever maxTreeAdaptonMemoised
or maxTreePathAdaptonMemoised
recurse down the tree, they look up the athunk in the memoisation cache that corresponds to that unit of work. If the athunk is clean, they just return the value without recursing further. If the athunk is dirty, they do recompute. Mutating a single leaf node marks every path from the root to that leaf as dirty, so the max value will be recomputed, but the paths unaffected by the mutation remain clean.