New stuff:
- The first draft of internal consistency in streaming systems. I still have a few more systems to study, repros to write and authors to badger for fact-checking.
- Memory-mapped IO registers in zig - a little self-contained case study of api design in zig. Kevin Lynagh is writing a much more detailed post on his experience writing keyboard firmware in zig, which should be out in time for the next update.
- Notes on A small matter of programming.
I'm also making the map of incremental and streaming systems public.
Work on reproducing inconsistency in kafka streams is not going well.
I have been able to stand up their page views example and feed it data, despite the best efforts of their documentation. Unfortunately the left join always returns nulls regardless of the input data. I'll continue banging my head against it, but if anyone would like to join in you can find the code here.
One thing to note is that when I asked for help on the confluent slack, I got suggestions like "maybe it's a race condition" or "maybe you need to tweak this config variable". What I didn't get was "wow it's really surprising that the examples don't work correctly out of the box".
I think one of the best ways to make a complex idea understandable is to be able to boil it down to an implementation small and clear enough that a motivated student can read and understand it. rxi has a knack for doing this:
- fe is an embedabble lisp-alike in ~1000 lines of c.
- microui is an immediate-mode GUI library in ~1700 lines of c. (The gui library for my text editor is an almost direct port).
- lite is a fully featured text editor. The core is ~1500 lines of c and the rest of the functionality is built up in ~5500 lines of lua.
Martin Kleppman is starting a patreon. He's maybe best known for Designing Data-Intensive Applications but I'm more excited about his work on making CRDTs practical.
This paragraph resonated with me:
For me it is important to have this mixture of research, open source software development, and teaching (through speaking and writing), because all of these activities feed off each other. I don't want to just work on open source without doing research, because that only leads to incremental improvements, no fundamental breakthroughs. I don't want to just do research without applying it, because that would mean losing touch with reality. And I don't want to just be a YouTuber or writer without doing original research, because I would run out of ideas and my content would get stale and boring; good teaching requires actively working in the area.
In case you missed it, Vectorized released a set of benchmarks comparing Redpanda and Kafka. The article itself is not great - just enough detail to sound impressive but not enough to be educational - but I'm struck by how Redpanda's line in every graph is totally flat.
I'm also struck that they mention running >400 hours of actual benchmarks. With the machine configurations they mention that adds up to ~$5k. Chump change for a business but kind of intimidating when you're living on sponsorships.
I'm going to be very distracted for the next two weeks. If I get a chance I'll write up some more old work, but more likely the next update will be in late March.