0047: babys second wasm compiler, zig honggfuzz, values can be values, dont look UB, surely you can be serious, other links, books

Published 2024-07-11

I finally got the zest compiler architecture and codegen more or less settled. Lot's of details in baby's second wasm compiler. I'm gonna take a two week break and then move on to writing the self-hosted runtime.

zig honggfuzz

Zig doesn't have support for coverage-guided fuzzing, so I made a little demo of how to use zig's c backend with hongfuzz and clang.

values can be values

We use the word 'value' to describe, uh, values in programming languages. But also in hashmaps/objects we have key-value pairs. Which means I find myself writing docs like 'both key and value can be any value'. This is confusing. Can I replace one of these uses of 'value' with a different word?

Don't look UB

In the presence of undefined behaviour, compiler optimizations can remove the sanitizer checks that would have detected the undefined behaviour. IIUC this mostly affects ASan and MSan because they are inserted after optimization. UBSan is somewhar more reliable because it inserts non-inlinable calls to sanitizer functions before optimization, and the optimizer usually can't remove those. But there is still some tragicomedy:

This is undefined behavior so UBSan inserts a check that will report this error and abort the execution. Because the UBSan check is a side effect in a function assumed to be pure, the check itself is now undefined behavior. In the final program the optimizer removes the undefined UBSan check and substitutes the evaluation of ub() with an arbitrarily chosen value.

Many of zigs checks are inserted as llvm ir before optimization. I'm not sure whether they could be susceptible to the same kinds of problems.

Wasm doesn't have UB at all, and regular zest won't either. The zest runtime will need access to some unsafe operations, so I'll have to think hard about how those will work alongside my own optimizations.

surely you can be serious

...hoping that someone would eventually give me permission to be serious, to start doing things that are good instead of things that look good.

Before that, I believed what everybody else seems to believe: if you play the game well enough and long enough, eventually you get to stop playing and go do whatever you want. I played the game pretty well for a long time, and now it's obvious to me that the reward for playing the game is more game. You just keep unlocking levels forever, and the levels don't even get more interesting ("Ooh, this one is in space!"). It's just the same thing over and over until you die. You don't get out by winning; you get out by stopping.

That's why you gotta be serious about something. It's like a protective amulet that prevents you from Goodharting yourself. Unserious people might seem free, unburdened by the dreadful commitment of caring about anything. But they are in fact hackable and distractible, susceptible to whatever game can trap them in a behavior loop. They're like that guy in that 1990s anti-drug commercial, walking in circles, muttering, "I do coke...so I can work longer...so I can earn more...so I can do more coke."

"Well, we don't live in a utopia. You have to make tradeoffs in life." Yes, of course! But the whole point of tradeoffs is to trade something you value less for something you value more.

No raw data, no science.

As an Editor-in-Chief of Molecular Brain, I have handled 180 manuscripts since early 2017 and have made 41 editorial decisions categorized as 'Revise before review', requesting that the authors provide raw data. Surprisingly, among those 41 manuscripts, 21 were withdrawn without providing raw data, indicating that requiring raw data drove away more than half of the manuscripts. I rejected 19 out of the remaining 20 manuscripts because of insufficient raw data.

Among the 40 withdrawn or rejected manuscripts, 14 were later published in other journals.

Bridging between source languages, in Wasm. Explaining why the wasm component model is needed, and why other obvious solutions aren't satisfactory.

Posts one and two about a new columnar file format. I haven't actually grappled with this problem in practice so I'm not a good judge, but the thought process behind their design does appeal to me.

academicish voice. An internal document from ink-and-switch outlining their writing style and the reasoning behind it.

OpenTOC. Links to open access pds from various acm sig conferences.

Making No-Fuss Compiler Fuzzing Effective. Fuzzing compilers doesn't work very well because the mutations most fuzzers use very rarely produce valid code. Language-specific tools like csmith are effective but require a huge investment and also don't find bugs where the official language grammer differs from what the compiler accepts. So they made a regex-based mutator that has the barest understanding of language syntax, and this was enough to produce way more valid mutations and find a bunch of bugs in existing industrial compilers. I kind of want to try doing this for zest, but make a mutator that parses the input, mutates the concrete ast and then prints it out.

books

Endure. Surprisingly satisfying pop-sci. A history of the study of endurance sports, told with an appropriate amount of skepticism and epistemic humility, and not trying to sell you on any particular result.

How to know a person. Not particularly concrete advice, but I at least enjoyed reading about the authors own experiences.

A philosophy of software design. I didn't get much out of it myself and I definitely found myself nitpicking a lot, but it is by far the least bad book I have read on software design. Certainly the only one that I could recommend with a straight face.