(This is part of a series on the design of zest. See the list of posts here.)
Questions include:
- Is compilation deterministic? Can the output vary depending on the target platform? On environment variables? On the time of day? On the order in which modules are compiled?
- Is incremental compilation possible? Does it have well-defined semantics or does the output depend on the order in which changes are made?
- Can compilation be parallelized, or does the order of compilation matter?
- Are there well-defined semantics for changing code in a running process? What happens to old closures and old types?
- How does compile-time evaluation interact with binding? Can compile-time code create new bindings or modules? Are all declarations evaluated, or only reachable declarations, or are they evaluated lazily at runtime?
- Is it possible to determine at compile-time what declarations are reachable, or can runtime reflection reach any declaration? Is it possible to tree-shake ie only emit code for declarations that are reachable?
- If values are constructed at compile-time and used at runtime, how are they moved/serialized between stages?
- Does code always behave the same when evaluated at compile-time vs runtime, or are there different dialects?
- What happens if compile-time evaluation doesn't terminate?
See there are no strings on me for context.
Not covering every language under the sun. Mostly focusing on languages where I have real experience, plus a smattering of interesting unfamiliar languages at the end.
I'm interested in hearing about languages where the answers to the questions above differ significantly from the languages below.
zig
> zig version
0.14.0
Every type has an associated namespace.
const Foo = struct {
x: usize, // instances of Foo will have a field `x`
pub const bar = 1; // this is a constant attached to the Foo type
pub fn quux() void {} // this is a function attached to the Foo type
};
@import("./foo.zig")
returns a struct type whose associated namespace contains all the public top-level declarations in that file. Every call to @import("./foo.zig")
returns the same struct type, so circular imports are fine.
Types are first-class values in the compile-time dialect but cannot be referenced at all in the runtime dialect. This enables a fair amount of meta-programming at compile-time without giving up precise tree-shaking.
Declarations are wrapped in a lazy thunk and only forced if either evaluated at compile-time, or reachable from exported functions after specialization and compile-time evaluation.
const foo = 1 / 0; // this is fine because it's not reachable
const bar = 1 / 0; // this is an error becauses it's reachable
pub fn main() void {
bar;
}
Bindings are resolved eagerly at parse-time:
fn foo() void {
nonsense(); // error
}
pub fn main() void {}
But accessing a declaration from a namespace is resolved only when evaluated or reachable:
const real = struct {};
fn foo(x: usize) void {
if (x == 0) {
real.nonsense();
}
}
pub fn main() void {
comptime foo(42); // ok, because branch body isn't evaluated
foo(42); // error, because branch body is reachable
}
Declarations can be mutually recursive. The runtime detects cycles:
// error: struct 'test.T' depends on itself
const T = struct {
a: if (@sizeOf(T) == 8) u32 else u64,
};
Compile-time evaluation is deterministic and cannot perform side-effects (but can access variables set by the build file for conditional compilation).
fn foo() void {
@import("std").debug.print("hello!", .{});
}
pub fn main() void {
comptime foo(); // error, no printing at compile-time
foo(); // ok, printing is allowed at runtime
}
Compile-time evaluation emulates the target platform, so cross-compilation works naturally.
pub fn main() void {
const comptime_word_size = comptime @typeInfo(usize).int.bits;
const runtime_word_size = @typeInfo(usize).int.bits;
@import("std").debug.assert(comptime_word_size == runtime_word_size);
}
It's intended that the order of declarations and the order of their lazy evaluation does not matter. At the time of writing it's possible to observe the order of evaluation using usingnamespace
, but this is slated to be removed.
const T1 = struct {
pub const x = 0;
};
const T2 = struct {
const a = @typeInfo(T2).@"struct".decls.len; // evaluated before `usingnamepsace` so doesn't include `x`
pub usingnamespace (if (a == 0) T1 else T1); // depends on `a` so forces `a` to be evaluated earlier
const b = @typeInfo(T2).@"struct".decls.len; // evaluated after `usingnamespace` so includes `x`
};
pub fn main() void {
@import("std").debug.assert(T2.a == 0);
@import("std").debug.assert(T2.b == 1);
}
Comptime zig is a slightly different dialect from runtime zig, so staging affects behaviour:
const List = struct {
next: ?*const List,
};
fn list(n: usize) *const List {
return &List{ .next = if (n == 0) null else list(n - 1) };
}
pub fn main() void {
@import("std").debug.print("{}\n", .{comptime list(3)});
@import("std").debug.print("{}\n", .{list(3)});
}
test.List{ .next = test.List{ .next = test.List{ .next = test.List{ ... } } } }
test.List{ .next = test.List{ .next = General protection exception (no address available)
In the list
function above, at comptime each list node is heap-allocated and garbage-collected, but at runtime each list node is stack-allocated and returning a pointer to a node is UB.
Arbitrary data-structures can be passed from compile-time to runtime, and are baked into the executable's constant section. To guarantee deterministic compilation, code evaluated at compile-time is not allowed to convert pointers to integers.
Type can be parameterized by arbitrary values.
fn T(comptime param: anytype) type {
return struct {
pub const value = param;
};
}
I don't think the equality rules for such type parameters are documented? From some experimentation, it seems that:
- Parameters of different types are not equal.
- Parameters of the same type are compared by structural equality.
- Types are not allowed to close over pointers to mutable variables.
- Constructing cyclic parameters using const declarations is detected as a dependency loop.
Excessive compile-time computation is detected by counting the number of backwards edges traversed by the interpreter.
fn wide(n: usize) usize {
if (n == 0) {
return 0;
} else {
return wide(n - 1) + wide(n - 1);
}
}
pub fn main() void {
comptime wide(30);
}
> zig run test.zig
test.zig:5:20: error: evaluation exceeded 1000 backwards branches
return wide(n - 1) + wide(n - 1);
~~~~^~~~~~~
test.zig:5:20: note: use @setEvalBranchQuota() to raise the branch limit from 1000
test.zig:5:20: note: called from here (23 times)
test.zig:5:34: note: called from here (5 times)
return wide(n - 1) + wide(n - 1);
~~~~^~~~~~~
test.zig:10:18: note: called from here
comptime wide(30);
~~~~^~~~
referenced by:
posixCallMainAndExit: /nix/store/5yjsvw9r543c8h1mq6ylcgk6w3aphwb2-zig/lib/std/start.zig:647:22
_start: /nix/store/5yjsvw9r543c8h1mq6ylcgk6w3aphwb2-zig/lib/std/start.zig:464:40
3 reference(s) hidden; use '-freference-trace=5' to see all references
The limit can be raised locally with setEvalBranchQuota. This is nicely composable - if I test that I've raised the quota enough within my library, I don't have to worry that users of my library will hit errors from adding their own compile-time computation.
rust
> rustc --version
rustc 1.82.0 (f6e511eec 2024-10-15)
Modules are entirely second-class and there is no way to refer to a module directly. Cyclic dependencies are allowed between different modules in the same crate.
A limited subset of the language is available at compile time for defining constants and type parameters.
All constants are evaluated strictly, even if not reachable.
// error: attempt to divide `1_usize` by zero
const x: usize = 1/0;
pub fn main() -> () {}
Cycles are detected statically, rather than dynamically as in zig ie unreachable code can still trigger cycle detection.
// error: cycle detected when simplifying constant for the type system `x`
const x: usize = y;
const y: usize = if false {x} else {1};
pub fn main() -> () {}
This means that cycle detection doesn't depend on eg what platform the code is compiled on, but on the other hand requires a separate macro system for conditional compilation.
Compile-time constants are also type-checked before evaluation.
// error: expected `usize`, found `&str`
const x: usize = if false {"foo"} else {1};
pub fn main() -> () {}
Long running const fns are detected by the interpreter. Rust doesn't currently provide the whole stack trace, although I'm sure they could provide an option for that.
const fn wide(n: usize) -> usize {
if n == 0 {
0
} else {
wide(n-1) + wide(n-1)
}
}
// error: constant evaluation is taking a long time
const x: usize = wide(30);
pub fn main() -> () {}
> cargo run --bin main
Compiling testrs v0.1.0 (/home/jamie/testrs)
error: constant evaluation is taking a long time
--> src/bin/main.rs:5:8
|
5 | wide(n-1) + wide(n-1)
| ^^^^^^^^^
|
= note: this lint makes sure the compiler doesn't get stuck due to infinite loops in const eval.
If your compilation actually takes a long time, you can safely allow the lint.
help: the constant being evaluated
--> src/bin/main.rs:10:1
|
10 | const x: usize = wide(25);
| ^^^^^^^^^^^^^^
= note: `#[deny(long_running_const_eval)]` on by default
I couldn't find any actual documentation on the limits, but the commit introducing this error mentions counting function calls and back-edges.
The errors can be downgraded to warnings by annotating the const value with #[allow(long_running_const_eval)]
. Putting this annotation on a const function doesn't seem to have any effect though - it has to be on the const value that the error was attributed to.
julia
> julia --version
julia version 1.11.3
Modules are declared with module ... end
. Modules don't inherit their enclosing scope but can access it explicitly with parentmodule
.
x = 1
module Test
print(x) // error
print(parentmodule(Test).x) // ok
end
The module system has no relation to files. include("foo.jl")
behaves exactly like textual inclusion and doesn't require creating a module. Using include("foo.jl")
in the repl is equivalent to copy-and-pasting the contents of foo.jl
.
Modules have identity. Redefining a module simply rebinds the name to a new module, but does not affect the old module.
julia> module Foo; x = 1; end
WARNING: replacing module Foo.
Main.Foo
julia> module Bar; p() = print(Main.Foo.x); end
Main.Bar
julia> Bar.p()
1
julia> Foo.x = 2
2
julia> Bar.p()
2
julia> OldFoo = Foo
Main.Foo
julia> module Foo; x = 3; end
WARNING: replacing module Foo.
Main.Foo
julia> OldFoo.x
2
julia> Foo.x
3
julia> Bar.p()
2
Types can't be redefined within the same module.
julia> struct T end
julia> struct T; x::Int64; end
ERROR: invalid redefinition of constant Main.T
Stacktrace:
[1] top-level scope
@ REPL[38]:1
Tools like Revise.jl automate the process of keeping source code in sync with running code, but can't change the inability to redefine types.
Julia behaves as if late-bound, but compilation might inline functions. To avoid having to constantly guard the functions haven't changed, julia specializes functions on 'world age'. Every new function definition increases the world age.
julia> Base.tls_world_age()
0x000000000000683c
julia> Base.tls_world_age()
0x000000000000683c
julia> f() = 1
f (generic function with 1 method)
julia> Base.tls_world_age()
0x000000000000683d
A function called at one world age will never see updated definitions from a later world age. To switch to new age the programmer must explicitly invokelatest
.
julia> foo() = 1
foo (generic function with 1 method)
julia> bar() = foo()
bar (generic function with 1 method)
julia> function test()
println(bar())
eval(:(Main.foo() = 2))
println(invokelatest(bar))
println(bar())
end
test (generic function with 1 method)
julia> test()
1
2
1
julia> bar()
2
Top-level statements can perform arbitrary side-effects, so load order is trivially observable. In fact, loading a module Foo
calls Foo.__init__()
if it exists.
Code that runs at jit-time (eg @generated functions) can perform arbitrary side-effects, so jit order is observable.
julia> @generated function foo(x)
println("foo")
:x
end
foo (generic function with 1 method)
julia> @generated function bar(x)
println("bar")
:x
end
bar (generic function with 1 method)
julia> function quux(x)
foo(x)
bar(x)
end
quux (generic function with 1 method)
julia> quux(1)
foo
bar
1
julia> quux(1)
1
There is no clear phase separation and no guarantee of deterministic compilation. This creates many caveats for precompilation.
Types can be parameterized by values, but only 'plain data' values. This makes equality simple.
help?> isbitstype
search: isbitstype isbits isabstracttype ismutabletype isprimitivetype isconcretetype nonmissingtype
isbitstype(T)
Return true if type T is a "plain data" type, meaning it is immutable and contains no references
to other values, only primitive types and other isbitstype types. Typical examples are numeric
types such as UInt8, Float64, and Complex{Float64}. This category of types is significant since
they are valid as type parameters, may not track isdefined / isassigned status, and have a defined
layout that is compatible with C. If T is not a type, then return false.
There are no guards on evaluation time. A generated function with an infinite loop requires restarting the repl.
julia> @generated function foo()
while true
end
end
foo (generic function with 1 method)
julia> foo()
^C^C^C
clojure
> clj --version
Clojure CLI version 1.12.0.1479
Top-level expressions can have arbitrary side-effects, including mutating environments. Load order is very observable.
(ns test.core)
(def x 1)
(ns-unmap (find-ns 'test.core) 'x)
(def y x) ; Syntax error. Unable to resolve symbol: x in this context
Like julia, this creates many caveats for AOT compilation.
Clojure is early-bound, but binds to a mutable Var
and dereferences on every call:
user=> (def x 1)
#'user/x
user=> (defn foo[] x)
#'user/foo
user=> (foo)
1
user=> (def x 2)
#'user/x
user=> (foo)
2
user=> (defn bar[] y)
Syntax error compiling at (REPL:1:1).
Unable to resolve symbol: y in this context
It's possible to get a direct reference to the Var
itself.
user=> x
2
user=> #'x
#'user/x
user=> (type #'x)
clojure.lang.Var
user=> (var-get #'x)
2
Circular references are handled by forward declaration:
user=> (declare odd)
#'user/odd
user=> (defn even [x] (if (zero? x) true (odd (dec x))))
#'user/even
user=> (defn odd [x] (if (zero? x) false (even (dec x))))
#'user/odd
user=> (even 3)
false
This means that it's not possible to have circular dependencies between namespaces (without shenanigans).
Reloading a namespace mutates all of the existing Var
s rather than replacing them, but does not remove old declarations that no longer exist:
user=> (println (slurp "./src/test/core.clj"))
(ns test.core)
(def x 1)
(def y 2)
(defn foo []
y)
nil
user=> (require 'test.core)
nil
user=> test.core/x
1
user=> (println (slurp "./src/test/core.clj"))
(ns test.core)
(def x 2)
(defn foo []
y)
nil
user=> (require 'test.core)
nil
user=> test.core/x
1
user=> (require 'test.core :reload)
nil
user=> test.core/x
2
user=> test.core/y
2
user=> (test.core/foo)
2
The reloading is made possible by every namespace having a globally unique name. This means that it's not possible to load two different versions of the same library.
Namespaces are first-class, but their value can only be obtained indirectly:
user=> test.core/x
2
user=> test.core
Syntax error (ClassNotFoundException) compiling at (REPL:0:0).
test.core
user=> (find-ns 'test.core)
#object[clojure.lang.Namespace 0x4cafa9aa "test.core"]
There are no guards on evaluation time. A macro with an infinite loop will require restarting the repl.
user=> (defmacro foo[] (loop [] (recur)))
#'user/foo
user=> (foo)
erlang
> erl -version
Erlang (SMP,ASYNC_THREADS) (BEAM) emulator version 13.2.2.14
Modules are not first-class and are typically just represented by a symbol, but all their contents are available via reflection functions eg module_info
.
The runtime stores the last two versions of each module. Loading a new version of a module causes the runtime to kill any processes that have direct references to the oldest version.
A direct function call foo()
always calls the same version of the function as the calling function. An indirect function call module:foo()
always calls the latest version of the function.
Closures never update their function version. Calling a closure fails if the matching code does not exist in either of the last two module versions.
1> c(test).
{ok,test}
2> X = test:echo("foo").
#Fun<test.0.31264253>
3> % echo is in the newest version of test
3> X().
"foo"
4> % deleted the echo function in test.erl
4> c(test).
{ok,test}
5> % but echo is still in the old version of test
5> X().
"foo"
6> % compiling again removes the old version of test
6> c(test).
{ok,test}
7> % now echo is not defined anywhere
7> X().
** exception error: bad function #Fun<test.0.31264253>
8> % undeleted the echo function in test.erl
8> c(test).
{ok,test}
9> % now the newest version of test once again contains a function whose hash matches the hash stored in X
9> X().
"foo"
Typically upgrades are done by reloading all changed modules and then messaging all affected processes to notify them that they need to do an indirect call soon to switch to the latest version. Making this work correctly requires careful planning when writing the code.
There is no compile-time evaluation and aot compilation is mostly straightforward, although load order is observable via on_load
and can differ depending on whether code is loaded on demand or via a precompiled release.
These last few languages I'm not familiar with.
unison
> ucm --version
unison version: release/0.5.27 (built on 2024-10-01)
Top-level declarations can contain arbitrary pure expressions. Non-pure expressions won't type-check due to the effect system.
Declarations seem to be evaluated lazily, even though they're not typed as delayed values. You can add a declaration that causes an error and won't encounter an exception until you evaluate it (even though the type doesn't include the exception effect).
-- typed as `Nat`
foo = 1/0
bar b = if b then foo else 2
-- no exception, even though foo is reachable
> bar false
-- exception: divide by zero
> bar true
Namespaces don't really exist. Code is stored as a set of content-addressed values/functions. Version control is baked in to the language and each version of a project is just a map from identifiers like lib.base.data.List.map
to values/functions. Declaring a namespace in a file is just sugar to avoid typing out the full identifiers of each declaration.
terra
Similar staging to zig, but with lua as the compile-time language. Lua is late-bound but terra is early-bound. Compiling a terra function does lua table lookups to resolve namespaced identifiers, which can cause arbitrary side-effects in lua, so compilation is not deterministic and compile order is obserable.
other lisps
Other lisps that I've looked at share the same model inherited by clojure, where terms are evaluated one by one in order and can cause arbitrary side effects. AOT compilation gets bolted on later and put the onus on the programmer to ensure reasonable behaviour. Eg this common lisp discussion notes that:
...you wouldn't be able to reproduce that way; file compilation has different semantics than repl interaction (the latter happens strictly at "load" or "execute" time, while the former happens partially at "compile" time)
Racket is slightly different. It still allows arbitrary side-effects at compile-time, but where possible it rolls back mutations after compiling a module. This helps prevent runtime behaviour from depending on the order of compilation of modules, allowing modules to be compiled independently and in parallel as long as they refrain from non-undoable side-effects.