Ruminating about mutable value semantics

(This is part of a series on the design of zest. See the list of posts here.)

I have two goals for zest that are in tension:

Be a reasonably efficient imperative language (eg go, julia).
Treat values as data (eg erlang, clojure, see the shape of data).

In particular, for goal 2 I want to guarantee that deserialize(serialize(x)) == x .

It's tricky to satisfy both goals, because as soon as you allow mutable references you can create circular data-structures and have to deal with questions of identity vs value. Javascript, for example, doesn't satisfy goal 2 even though it has pervasive serialization in the form of json:

> let x = []
undefined
> JSON.parse(JSON.stringify(x)) == x
false

> let y = new Date()
undefined
> typeof(JSON.parse(JSON.stringify(y))) == typeof(y)
false

The only approach I know of to square this circle is mutable value semantics. The core idea is to allow mutation but prevent mutable references from being observed or aliased. The result looks like this:

var xs = list[string]['a','b','c']
let ys = xs // ys is an independent value, not an alias of x!
append(xs@, ys)
print(length(xs)) // prints 6
print(length(ys)) // prints 3

Enforcing these aliasing rules is the crux and there are many options for how to do it. Let's look at three in particular: implicit copies, dynamic alias tracking, and static alias tracking.

implicit copies

The simplest option is to always require a copy when moving into or out of a mutable value:

var xs = list[string]['a','b','c']
let ys = xs // deep copy xs
append(xs@, ys)
print(length(xs)) // deep copy xs?!
print(length(ys)) // deep copy ys?!

This actually works pretty well when we're using a mutable value mutably, but we really don't want that last copy for length(xs)!

dynamic alias tracking

The solution used in the toy language swiftlet in the original paper is to track aliasing dynamically by reference-counting large values likes lists. Where we would have copied xs before, now we just need to increment it's refcount. But now every time we mutate a value we have to check the refcount and maybe make a defensive copy.

var xs = list[string]['a','b','c']
let ys = xs // increment the refcount of xs
append(xs@, ys) // shallow copy xs and increment the refcount of the strings
print(length(xs)) // increment the refcount of xs
print(length(ys)) // increment the refcount of ys

Refcounting makes writing code very easy but also:

Makes it easy to accidentally copy a large value when you meant to mutate it in place.
Requires sophisticated optimizations to remove the performance overhead of modifying refcounts all the time.

static alias tracking

The authors of the swiftlet paper are currently working on a systems language called hylo. Swiftlet's implicit, unpredictable allocation is unappealing in a systems language so hylo instead tracks aliasing statically. All values have move semantics and the language is extended with immutable references as well as mutable references. Using the same syntax as the previous examples, this would look like:

var xs = list[string]['a','b','c']
let ys = deep-copy(xs) // removing this copy produces a compile error
append(xs@, ys) 
print(length(xs&)) // pass immutable reference to xs
print(length(ys&)) // compile error - ys was consumed by append - write append(xs@, deep-copy(ys)) instead

This allows hylo to have mutable value semantics while completely avoiding implicit copies and refcounting overhead, but at the cost of an additional burden on the programmer to annotate behaviour, and also increasing the number of programs which won't type-check.

I'm expecting expressiveness to follow a similar trajectory to rust. Early versions of rust were often frustratingly inexpressive but over time language extensions, libraries and idioms evolved to fill most of the gaps without sacrificing the core reasoning guarantees.

One of the gap-fillers hylo is experimenting with is remote parts, which introduce a limited form of references to the type system. This doesn't break the reasoning guarantees that hylo cares about - you still can't observe aliasing and deallocation remains deterministic. But it is very hard to square with my goal of transparent serialization - if I deserialize a value containing a remote part, what is the lifetime of that part and who is responsible for freeing it?

So programs probably become a bit more annoying to write, and at least one of the amerliorations is a feature that I will find difficult to copy.

alternatives?

I spent the last month filling notebooks, trying to figure out the contours of the design space. Some possible contours:

If references are ever observable then transparent serialization becomes hard. They have to remain totally 2nd class.
Moves metastasize. As soon as I add move semantics to anything I also need borrows and I end up in the rust/hylo valley.
Borrows don't metastatize. Weirdly, it seems totally reasonable to have borrows without also having move semantics.
Trying to split the world into 'things that need move semantics' and 'things that can be loosey-goosey' results in having all the complexity of move semantics plus a whole other world, and a bunch of rules for how to move between them. It never worked out simpler than just doing rust/hylo plus Rc<T>.

I only stumbled into one valley that felt meaningfully different from existing options:

Owned-vs-borrowed does not appear in the type-system. Instead every reference to a heap-allocated value contains a bit indicating whether it is owned or borrowed.
- In many cases the compiler can infer the value of this bit, avoiding the branch on drop.
When an l-value is converted to an r-value:
- If the destination has a longer lifetime then the source, or if the destination is a mutable reference, require a copy.
  - For small values (integers, floats, structs/unions with small fields) the copy can be implicitly inserted by the compiler.
  - For large values (strings, lists, maps) the copy must be explicit in the code.
  - In many cases the compiler can elide the copy by reasoning about ownership, so eg returning a value through a chain of N functions only requires one copy, not N copies.
- If the destination has a shorter lifetime than the source and is not a mutable reference:
  - The destination is a borrow of the source.
  - If the l-value ends with @, the destination is a mutable borrow of the source.
  - Mutable borrows are second-class - they can only be created inside function args and must correspond to mutable function params.
  - Enforce the usual shared xor mutable rules for borrows.
*T (pronounced shared T) is considered a small value and can be implicitly copied.
- If T is small then *T = T.
- If T is large then *T is a reference-counted version of T.
- If x: *T then x.* is a immutable reference to the contents.
- There is no way to get a mutable reference to the contents.

var xs = list[*string]['a','b','c']
let ys = copy(xs) // shallow copy xs and increment the refcount of the strings
append(xs@, ys) // pass immutable reference to ys, the body of append increments the refcount of the strings
print(length(xs)) // pass immutable reference to xs
print(length(ys)) // pass immutable reference to ys

This valley has some interesting tradeoffs:

Since function args are always borrows, we only pay for the overhead of ref-counting when ref-counted values actually escape.
If x is borrowed then deserialize(serialize(x)) is owned, but the programmer can't observe the difference. So we still get transparent serialization.
There is no need for remote parts or similar features because fields can be owned or borrowed regardless of type. They still can't escape without a copy.
Assigning to a mutable reference always requires a copy, but when I look at code I've written in the past usually the value is either:
- A value stored inline (int/float/struct etc) that would have to be copied even in rust/hylo.
- Something that is never mutated and so could be marked shared.

I'm not sure if it's a good valley. It's certainly an interesting valley. But I am a cowardly traveller, and will probably follow the clearly marked signposts down the established trail to the swiftlet valley.

Swiftlet is just a toy language though, so some tweaks are need on the way. Changes I'm considering:

closing over mutable references
immutably closing over mutable references
allowing aliased closures
enabling call by reference
result location semantics

closing over mutable references

Swiftlet has no way to generically iterate over mutable references. You can't (easily) write an external iterator because you can't store a mutable references and you can't write an internal interator because you can't close over mutable references.

So I'll allow closing over mutable references. Here is some code that calculates a running sum over a list using an internal iterator.

var total = 0
each(nums@, (i, num@) {
   total@ += num
   num@ = total
})

Such closures aren't allow to escape the lifetime of the references they close over and are treated as an alias of those references.

// compile error - closure aliases nums
each(nums@, (i, num@) {
   nums.{i+1}@ += num
})

// compile error - closure may not escape lifetime of nums
nums-inc = (i) {
  @nums.{i} += 1
}
return nums-inc

But the existence of such a closure won't otherwise prevent other mutable references to the same variable. This makes them useful for macro-like helper functions:

// closes over compiler, not compiler.expr_data/type
emit = (data, type) {
  append(compiler.expr_data@, data)
  append(compiler.expr_type@, type)
  return compiler.expr_data.len - 1
}
// can still use compiler
compiler.expr_data[some-expr].i32@ += 1
// can still use emit
expr = emit([i32-const: 1], [i32: []])

immutably closing over mutable references

Here is another way we could have written that running sum:

each(nums@, (i, num@) {
  if i > 0 {
    num@ += nums.{i-1}
  }
})

In a regular imperative language, code like this would work fine.

If we wrote something like this in rust, we'd get a compile error. The list nums is borrowed immutably by the closure and mutably by each-mut at the same time, which is not allowed. Annoying, but we can work around it by writing the code differently.

Under swiftlet rules the code is accepted by the compiler, but returns the wrong result! Creating the closure increments the refcount of nums, and then each-mut sees that the reference count for nums is more than zero and so defensively copies nums before mutating it. But the closure still refers to the original unmutated copy of nums.

This behaviour is surprising to anyone used to any other imperative language. Instead I'll make closures capture mutable variables by reference instead of by value, even if they only use them immutably.

// compile error - closure aliases nums
each(nums@, (i, num@) {
  if i > 0 {
    num@ += nums.{i-1}
  }
})

If we actually want to capture by value we can explicitly copy:

nums-copy = copy(nums)
each(nums@, (i, num@) {
  if i > 0 {
    // Now it's more obvious that this code does the wrong thing.
    num@ += nums-copy.{i-1}
  }
})

I'll treat these closures as immutable aliases during checking though, so it's safe to pass multiple such closures as args, so long as there are no args that mutably alias the same variables.

// this is ok
do-something-with(() nums.0, () nums.1)

// compile error - closure aliases nums
do-something-with(() nums.0, () { nums.1@ += 1 })

// compile error - closure aliases nums
do-something-with(() nums.0, nums.1@)

allowing aliased closures

Much more tentatively, I'm considering allowing closures to mutably alias. So case 2 from the previous example would now be allowed:

// this would be ok
do-something-with(() nums.0, () { nums.1@ += 1 })

This allows using closures to abstract over branching control flow, rather than just straight-line control flow:

if(
  i > 0, 
  () { i@ = 1 }, 
  () { i@ = -1 },
)

while(
  () i > 0,
  () i@ -= 1,
)

visit(
  tree,
  on-branch: (branch) handle-branch(@compiler, branch),
  on-leaf: (leaf) handle-leaf(@compiler, leaf),
)

Unlike the previous tweaks, this tweak actually sacrifices some expressivity elsewhere.

// no longer safe - g might mutably alias h
f = (g, h) g(h)

You could also argue that we also lose some reasoning power - we no longer know whether it's safe to change f(); g() into g(); f() if f and g might be aliased. But this was already the case because of IO - we'd need an effect system before this actually mattered.

enabling call by reference

Consider these two calls to f:

x = [small: 1, big: really-big-struct]
f(x@, really-big-struct)
f(x@, x.big)

In the first call, we'd hope that the compiler passes really-big-struct by reference rather than by value - much cheaper. But in the second call that isn't safe - it would cause observable aliasing if f mutates x.big through the first arg. So for the second call the compiler would have to first copy x.big to the stack and then pass the copy by reference.

We're already accepting that any write through a mutable reference might cause an implicit copy of the mutated value, so maybe an implicit copy of a struct here and there isn't such a big deal in comparison. But if we wanted to fix it we could say that x.big actually is an alias.

// compile error - x.big overlaps with x
f(x@, x.big)

// compile error - big is a reference to x.big, which overlaps with x
big = x.big
f(x@, big)

// this is ok
f(x@, copy(x.big))

result location semantics

Consider:

big = big-to-big(x.big)

Big structs won't be returned by value. Instead, the callee will allocate space for big on the struct and pass a pointer to that space as an extra argument to big-to-big.

But what if we write:

x.big@ = big-to-big(y.big)

In most languages it's not safe to pass a pointer to x.big as the return value because it might alias y.big. Instead most compilers will allocate temporary space for the return value, call the function, and then copy the return value into x.big.

But we have perfect alias tracking! We can avoid this copy. Except in this case:

x.big@ = big-to-big(x.big)

Now it's definitely an alias. We could detect this case and insert an implicit copy. Or we could make it explicit:

// compile error - x.big overlaps with return location x.big
x.big@ = big-to-big(x.big)

// this is ok - copy doesn't propagate the result location
x.big@ = copy(big-to-big(x.big))

If I could just implement each approach and compare ergonomics and performance then making decisions would be much easier. But I only have one lifetime, so I'm trying to balance moving the actual implementation forward in incremental baby steps while also writing imaginary code to figure out how each approach will feel.