0049: hytradboi 2025, consulting, zest progress, labeled continue, bet against sql, zero-cost costs in debug, packed memory arrays, papers, books

Published 2024-10-08

hytradboi 2025

HYTRADBOI is coming back in 2025, this time with a programming languages track.

I have 14 speakers confirmed so far, but there is still plenty of room. Let me know who you want to see!

consulting

I'm looking to pick up some consulting work over the winter. I put up a little page here.

Most of my experience is in databases, especially query engines, but I'm also working on building expertise in compilers and wasm through zest. I'd be especially excited to be able to get involved with a project where I can flex those skills.

zest progress

I mentioned some tweaks to the ir and type inference last time. That went well - the new code is much easier to work with and I was able to delete a bunch of grody workarounds. The interpreter got a little faster and simpler as a side-effect, which is nice. The new ir made it easy to add %each and %from-only intrinsics, and I extended type inference to eliminate branches on constant types, so I'm now much closer to be able to write a generic print function that monomorphizes nicely. The only blocking features now is recursive functions - all the other features I listed last time are just nice-to-have.

I burned the rest of the month writing smolderingly fast b-trees, in the hopes that I'd be able to get good enough performance to solve the ordering problem. That hope isn't totally dead - Paul Khuong shared some ideas around fast lsms which I want to try out at some point (the asymptotics are the same as btrees, but the memory accesses don't have serial data dependencies so it's more friendly to speculative execution). But in parallel I'm also going to move ahead with compact hash-tables and just sort the key-value array before iterating. That gets me something that works without locking in any unpleasant semantics.

labeled continue inside switch

Zig added a feature that makes it easy to write direct threaded state machines.

The simple way to write a state machine is like this:

var state = initial_state;
while (true) {
  switch (state) {
    .a => { 
      ...
      state = .b;
    }, 
    .b => { 
      ...
      if (cond) {
        state = .a; 
      } else {
        break;
      }
    }
  }
}

When compiled this produces code that jumps to the correct branch for the state, and at the end of the branch jumps back to the top. That first branch is hard to predict so speculative execution by the cpu doesn't improve performance much.

The goal of direct threaded code is to generate a copy of that first jump at the end of every branch. This lets the branch predictor learn separate predictions for each branch. Eg in the .a branch above the next state is always .b, so with direct threading the branch predictor will quickly learn to always predict .b after .a.

There are various hacks for emit direct threaded code in c, and basically nothing for other languages. This new zig feature extends the switch statement like this:

machine: switch (initial_state) {
  .a => {
    ...
    continue :machine .b;
  },
  .b => {
    ...
    if (cond) {
      continue :machine .a;
    } else {
      break :machine;
    }
  },
}   

This generates exactly the code we want.

Changing the zig tokenizer to this style netted a 7% speedup and a smaller binary, which is pretty impressive for code that is already pretty tight, and for a change that improves readability.

I love this feature - state machines are everywhere. Sadly, I don't think I can add it to zest - there is no way to produce this code in wasm. I have heard that adding a suitable instruction has been discussed, but it doesn't seem likely to be high priority any time soon. I am curious whether generating tail-recursive closures would produce good results in existing wasm backends.

bet against sql

I love the ideas behind convex. I don't know that this is the best argument for them, but it's still really good to see people rethinking database UX from scratch.

vector math library codegen in debug

This is a really nice case study of how zero-cost abstractions can actually be very expensive in debug builds.

I really feel this in rust, where even basic features like iteration rely on heavy optimization. When working on materialize, debug builds were unusably slow so I had to wait for full release builds all the time.

I find it really interesting that in zig more of the abstraction work is pushed into language-guaranteed optimizations, which are by their nature much cheaper to produce. Debug builds of tigerbeetle are reasonably fast!

packed memory arrays rewired

I read a paper on packed memory arrays years ago and have never been able to remember or google-fu the term since. Paul Khuong reminded me, and I found this recent implementation which improves the worst-case for highly skewed inputs.

Sadly, it relies on virtual memory tricks that I can't replicate in wasm.

other papers

I went through a ton of btree and other data-structure papers this month, but basically none of them were worth reading. I also failed to reproduce results from several papers about btree optimizations.

books

The WEIRDest People in the World. I really wanted to like this book - I love the ideas - but I just couldn't get through it. After 3 extensions I gave up and returned it unfinished.

Normally I don't mention fiction on here, but this was such a good month for it:

Fight Me and The Bright Sword. Like all Grossman books these were excruciating to read but somehow also cathartic. The fact that all of their books are about the nagging feeling that you were supposed to do something with your life but missed the moment does make me wonder about their mental health though.

I'm Starting to Worry About This Black Box of Doom. I love all of Jason Pargins books (and also his blog). All the recent ones are half story, half author-insert monologues. But unlike most authors that do this, he has a talent for the ideological turing test - his characters come at the rants convincingly from all angles. It really feels like he wants to immunize people against the distorted view of the world created by news and social media, and he has the empathy to do that convincingly while also writing a fun story.

The Gone-Away World. There is a plot of sorts in here, but read it for the dialogue. It was really hard not to ruin it for everyone around me by constantly reading quotes out loud.