Picture a traditional webapp. We have a bunch of stateless workers connected to a stateful, relational database. This is a setup with a number of excellent properties:
- All state can be queried using a uniform api - SQL. This enables flexible ad-hoc exploration of the application state as well as generic UIs like django admin
- Every item of state has a unique and predictable name by which it can be identified - a unique key in the table.
- Access to state can be restricted and controlled. Transactions prevent different workers from interfering with each other. ACLs allow giving individual workers access to only the information they need to limit the scope of mistakes.
- Changes to state can be monitored. Tailing the transaction log is an effective way to stream to a backup server or to debug errors that occurred in production. One can reconstuct the state of the database at any time.
- State is separate from code. You can change client code, rename all your classes or restart workers on the fly and the state in the database will be unharmed.
The database state is also pervasive, mutable and global.
Now let's zoom in and look at our imperative, object-oriented workers:
- State is encapuslated in objects and hidden behind an ad-hoc collection of methods.
- Objects are effectively named only by their location in the (cyclic, directed) object graph or their position in memory.
- Access to state can be restricted and controlled through encapsulation. Concurrent modifications are a constant source of bugs. Access control is adhoc and transitive - if you can walk the object graph to an object you can access it.
- Changes to state are usually monitored via adhoc methods such as manually inserting debug statements. Approximating history by watching debug statements and reconstructing state in ones head is the normal method of debugging.
- State is entangled with code. Portable serialization is difficult. Live-coding works to some extent but requires reasoning about the interaction of state and code (eg in js redefining a function does not modify old instances that may still be hanging around in data structures or in callbacks)
Functional programmers need not look so smug at this point. The Haskell/OCaml family struggles to redefine types at runtime or handle live data migrations (the declaration of a nominal type is a side-effect in a live language). Clojure does better on these points but still gets burned by nominal types (eg extend a deftype/defrecord and the reeval the definition) and more generally by treating the definition of new code as mutation of state (which has to be papered over by tools.namespace).
Why are these points important? We spend most of our time not writing code but reasoning about code, whether hunting for bugs, refactoring old code or trying to extend a module. We end up with questions like:
- When did this state change?
- What caused it to change?
- Did this invariant ever break?
- How did this output get here?
How do we answer these questions?
In the database we have a transaction log containing for each transaction: the queries involved, the commit time, the client name etc. We can write code that specifies the condition we are interested in via an sql query, locates the relevant transactions by running through the log and then recreates the state of the database at that point. This works even if the error happened elsewhere - just have the user ship you their transaction log.
In the worker, we have two familiar workhorses:
- Manually add print statements, recompile the code, try to recreate the conditions we are interested in and then reconstruct the causality by reading the printed statements
- Add a breakpoint in the debugger, try to recreate the conditions we are interested in and then step through the code line by line
What these two have in common is that they are both achingly manual. There is no easy way to automate the process. There are no libraries full of debugging strategies that you can deploy. The questions we have are about time and causality but our current tools restrict us to looking at tiny slices of space (print statements) or time (debuggers) and offer no way to automate our actions.
I propose that if we were to manage state more like a database and less like a traditional imperative language then understanding and debugging programs would become easier.