New:
- An opinionated map of incremental and streaming systems. I wanted to write about query planning for streaming systems, but it turned out to be hard to do that without first writing about streaming systems in general.
- First draft of thoughts on benchmarking streaming systems. This will likely evolve in parallel with the actual benchmarking and end up being an overview for the project.
- Notes on 'The mature optimization handbook'
I'm making the subquery optimization post public. Feel free to share it, tweet it, print it out and give copies to strangers on the street etc.
I'm not settled yet on how to decide which posts will be public or not, but this one I feel I owe to the Materialize engineers since I wrote this gnarly code and then left without fixing the downstream optimization issues.
A lot of the complexity of building eg web applications at scale is in propagating changes between various different layers and reasoning about the consistency of the system as a whole. Database consistency models are primarily about data at rest which makes it very difficult to compose them with the rest of the system. Much of my interest in incremental/streaming/continuous systems is because they provide a way to describe the motion of data over time across the entire system. I find timestamps and watermarks much easier to understand and compose than the zoo of database consistency models, and they also translate naturally into reasoning about other distributed systems.
So I was excited to see in Joe Hellerstein's POPL keynote that his lab is going to be revisiting the CALM work that got me so excited about this field in the first place.
This lead me on a talk-watching binge. Highlights:
- Peter Alvaro on What not where: why a blue sky OS. Designing an OS for persistent memory. Out with virtual memory mappings and filesystems, in with globally unique ids.
- Andy Pavlo on Persistent memory databases.
- CMU quarantine 2020 database talks. All interesting, and in much more technical depth than the typical tech conference talk.
What I'm working on:
- I'm going to see if I can reliably reproduce some of the inconsistencies mentioned in the map post above. Best-effort consistency tends to only fail at scale, so making demonstrations that people can easily reproduce at home might require some chaos monkeys.
- Still reading Systems Performance. It's an incredibly dense book - I'll likely be working through it for a few months.