0015: imp internals, reflections, precedence, make mode, mutant, q3, error recovery, tonsky ui, subtext 10, factfulness, benchmarking advice, dependency hubs, independent research, zig wayland, retool, observable dependencies, ugly buildings, without scihub, wasm virtual memory, huawei breakdown, infrastructure langauges, stencil vectors, chiX

New stuff:

I drastically changed the internal representations in imp. This was setup work for decorrelation of higher-order functions and fix/reduce, but as a side effect it also made closures much smaller and fixed some nasty edge cases in type inference and the interpreter, bringing the runtime for integration tests from ~30s down to ~0.1s.
Reflections on a decade of coding. Very tentative advice on progressing past the novice stage. I still feel very dubious about writing these but people have been very encouraging. I maybe have too high a bar on how confident I have to be that something is true before I think it's worth sharing.
Better operator precedence. A slightly nicer way to handle precedence in Pratt parsers. Also allows having a non-total order on binding power. Derived from imp, where I mostly want to avoid having precedence rules at all but they are crucial for a few very common combinations of operators.
I added a simple make mode to focus that will run a bash command (autocompleting from .bash_history) and restart it on every save. I wanted to be able to jump to error locations too but I ran out of time this month.
Mutant is a library which parses zig code and inserts random bugs. I was thinking to use it for some structured debugging practice.
I keep seeing How safe is zig? used in silly flamewars. I've now added much more context to the beginning, so you now have to at least read below the fold to get to the flamewar fuel.
2021 Q3 roundup

The author of rust-analyzer made a video about error recovery in their recursive descent parser.

The bulk of the strategy seems to be whenever you would return an error, just record it instead and continue as if you'd seen the expected token.

If that was all they did then errors in half-written code could the parse of finished code later in the file.

struct fo| // cursor here

fn foo() usize {
    // this is a valid function, want to parse this correctly instead of using the curly braces to finish the struct definition
    return 42;
}

That's handled by recovery sets (discussed here). I don't really follow the implementation yet though. I think it just continues to record errors without ever eating any tokens from the recovery set.

Two ideas for experiments:

A downside of recursive descent parsers is that it's not clear what language is encoded, so it's hard to port them. Would fuzzing rust-analyzer vs rustc be effective at uncovering differences? Or if you inserted deliberate bugs into rustc and fuzzed it against the original, what kind of bugs are found effectively?
Treesitter has a totally different approach to error recovery that I think requires having an explicit grammar. How does it compare in practice? You could record a real coding session edit-by-edit and then check how often each of them is able to maintain a mostly-correct tree.

I also really like the offhand reference to recursive descent as 'immediate-mode parsing'.

Tonsky is writing a UI library. I'm glad more and more people are taking seriously the idea that the web is not the final end state of UI programming.

Jonathan Edwards published the design document for subtext 10, his latest experiment in end-user programming.

It's a good sign for a design doc when there are so many interesting ideas that I have the urge to implement it just so that I don't have to wait to play with it.

I read Factfulness. I don't have much to add to the abstract except that it's a good unicorn chaser for, uh, the internet.

I get amazing spam:

...uses the latest technology to make learning engaging and accessible. It's an extremely sophisticated edtech platform that uses the power of AI, facial recognition, smart chat, and blockchain technology to fix the issues of today's Education & Training Industries.

The power of buzzword, buzzword, buzzword and an irreversible way to send us money...

A phd student emailed me to ask for advice on benchmarking streaming systems. My advice was to pick a different subject (I don't think that went down very well).

I think the scope of the problem is probably on par with benchmarking databases, except that the hardware costs are higher and the workloads much less well understood. The latter is especially problematic if you don't have access to some large tech company's internal workloads.

Very few of the academic work I've seen on the subject was worth doing at all.

http://blog.khinsen.net/posts/2020/02/26/the-rise-of-community-owned-monopolies.html

https://blog.khinsen.net/posts/2021/06/10/the-dependency-hubs-in-open-source-software.html

...you can perhaps maintain your own fork of Perl, but you cannot fork its hub position in the network, nor can you reasonably maintain forks of its 7964 dependants. If the Perl maintainers introduce a breaking change, those 7964 dependents will either adapt or disappear. Hypothetically, a large number of them could together envisage maintaining their own fork. But there are no good coordination mechanisms among developers of unrelated Open Source projects, and therefore this doesn't happen in practice.

https://nadia.xyz/independent-research

From the author of Working in Public:

These days, if you say you work in research, most people assume you work in academia. But it's sort of odd that we assume you need someone's permission to do research.

Imagine studying something that nobody else is studying, for reasons you can't really articulate, without knowing what the outcome of your work will be. For the truly obsessed person, the need for validation isn't about ego; it's about sanity. You want to know there's some meaning behind the dizzying mental labyrinth that you simultaneously can't escape and also never want to leave.

https://archive.fosdem.org/2021/schedule/event/zig_wayland/

On writing a wayland client in zig. The main interest is that the zig api manages to be significantly harder to misuse without being harder to use.

Retool looks like a really nice database-to-UI glue. From one of the authors of subform and sketch systems.

Observable has a gorgeous UI for tracking dependencies between nodes.

https://web.archive.org/web/20221206125136/https://simonsarris.com/work-on/

How did this happen? Even rich people often live in terribly ugly homes. With all the advantages of technology, we rarely build beautiful structures. All the progress in mechanization and materials seems to be little match for the colossal bad taste that permeates modern environments. It is far easier to move the earth than it was 200 years ago, yet the end result gives a feeling that nobody is really trying their best.

This extends beyond structures. The machine age has let us do so much, but it has also degraded work and craft into a means, taking something away from us in the process. What it has done to work it has done to art and leisure as well. If we may have another renaissance, it will be a recapturing of beauty.

https://palladiummag.com/2021/09/24/a-world-without-sci-hub/

In his book The Systems Bible, Gall defined the operational fallacy as a situation where 'the system itself does not do what it says it is doing.' In other words, what a system calls itself is not always a reliable indicator of its true function. In this case, the name of the 'academic publishing industry' implies that it is supposed to be involved in the dissemination of scholarship. But the effective function of the academic publishing industry as it actually exists is to prevent the dissemination of scholarly work.

Few people today are priced out of streaming films or buying MP3s, but access to scientific information has become a true luxury item.

https://github.com/WebAssembly/design/issues/1397

Interesting discussion of how wasm interacts with virtual memory. Many of the underlying capabilities of the OS and hardware are not exposed.

https://amosbbatto.wordpress.com/2021/10/06/huawei-smartphone/

Obstensibly a breakdown of the components of a single phone, but also contains a lot of an in-depth analysis of the shape of the mobile hardware industry as a whole and the long-term strategies of various manufacturers.

Examination of the insides of Huawei’s phones shows that Huawei is relentlessly focused on controlling its own destiny and building up its own ecosystem. Samsung and Apple are famous for their control of their own ecosystems, but teardowns of their flagship phones show that they actually use a lower proportion of their own chips than Huawei. 50% (20 out 40) of the ICs in the Mate 20 X (5G) are designed by HiSilicon, whereas 40% (12 out of 30) of the ICs in the Galaxy S10+ are designed and fabbed by Samsung and 14% (5 out of 36) of the ICs in the iPhone 11 Pro Max are designed by Apple. Huawei is at a distinct disadvantage compared to Samsung, since it doesn’t make its own RAM, Flash memory and image sensors like the South Korean company, but in half a decade Huawei learned to make many of the components in its phones, whereas Samsung has been working on its components for close to 3 decades.

An interesting comment:

https://lobste.rs/s/bh2epv/project_verona#c_xk1edj

Infrastructure languages have a set of requirements that slightly overlap both of these. They need to be safe because they're aimed at network-facing workloads that are going to be exposed to the public internet. A single security bug in the kinds of things that they target can cost millions of dollars. At the same time, they have strict resource (latency, memory overhead, downtime) constraints that also cost a lot of money if you violate. Programmers need to be able to reason about memory consumption and tail latency (which you can't do easily in a world with global GC) because they have SLAs that cost money if they are not met.

https://www.cs.utah.edu/docs/techreports/2021/PDF/UUCS-21-003.pdf

Racket added a new type to their runtime to improve the performance of Hash Array Mapped Tries. It seems like the main benefit is better layout and special-cased interaction with the garbage collector, so it would only be useful for managed languages.

I mentioned chidb in the last update, but I just realized that it's part of a whole suite of didactic projects.