The event-storming session with sticky notes is a great touch. It’s a perfect example of how starting with the why (links as signals of attention) and the domain events naturally leads you to an event-driven, microservices architecture, rather than just forcing that pattern because it's trendy.
One thing I’m really curious about that wasn't fully addressed: cache invalidation for analytics. You mention updating a counter in the main ShortLinks table for the MVP. At high scale, that counter increment on every redirect could become a brutal write hotspot for a viral link, even with an async process. I’ve seen teams handle this by using probabilistic counters (like HyperLogLog for uniques) or a separate, write-optimized counters table that gets batched and merged into the main record later.
Thanks so much! 🙏🏻 I’m glad you enjoyed the event-storming part, that was exactly the idea behind showing the “why before how”. The two approaches you mentioned - probabilistic counters like HyperLogLog and a separate counters table with batch merges - are definitely valid ones I’ve also seen work well. Another valid option is sharded in-memory counters: distribute updates across multiple shards instead of a single hotspot, then sum the shard values to get the total. It balances throughput with accuracy and avoids hammering one record. In your experience, did teams favor batching, probabilistic accuracy, or sharding?
Based on my experience, teams typically start by sharding critical counters, such as view counts, because it’s straightforward and easy to understand, even if it’s not perfectly real-time accurate. However, for metrics like unique user analytics, they almost always choose a probabilistic approach with HLL. The storage savings are just too significant to ignore, and you rarely need exact counts for dashboards.
Batching seems to be the fallback for very write heavy, intermediate-state metrics where slight delays are acceptable. Each one really depends on whether you’re optimizing for accuracy, performance, or storage.
The tiny world of small urls. It's never been more important ✨ space & attention is scarce ...
Wow, Jakob. I really enjoy how thorough this was.
The event-storming session with sticky notes is a great touch. It’s a perfect example of how starting with the why (links as signals of attention) and the domain events naturally leads you to an event-driven, microservices architecture, rather than just forcing that pattern because it's trendy.
One thing I’m really curious about that wasn't fully addressed: cache invalidation for analytics. You mention updating a counter in the main ShortLinks table for the MVP. At high scale, that counter increment on every redirect could become a brutal write hotspot for a viral link, even with an async process. I’ve seen teams handle this by using probabilistic counters (like HyperLogLog for uniques) or a separate, write-optimized counters table that gets batched and merged into the main record later.
Thanks so much! 🙏🏻 I’m glad you enjoyed the event-storming part, that was exactly the idea behind showing the “why before how”. The two approaches you mentioned - probabilistic counters like HyperLogLog and a separate counters table with batch merges - are definitely valid ones I’ve also seen work well. Another valid option is sharded in-memory counters: distribute updates across multiple shards instead of a single hotspot, then sum the shard values to get the total. It balances throughput with accuracy and avoids hammering one record. In your experience, did teams favor batching, probabilistic accuracy, or sharding?
Based on my experience, teams typically start by sharding critical counters, such as view counts, because it’s straightforward and easy to understand, even if it’s not perfectly real-time accurate. However, for metrics like unique user analytics, they almost always choose a probabilistic approach with HLL. The storage savings are just too significant to ignore, and you rarely need exact counts for dashboards.
Batching seems to be the fallback for very write heavy, intermediate-state metrics where slight delays are acceptable. Each one really depends on whether you’re optimizing for accuracy, performance, or storage.