Curated notes on engineering, product, and the craft of building digital experiences.
Most AI processing pipelines look like they work until one crash reveals they have been silently duplicating data, swallowing failures, and pretending retries are safe. This is the full story of breaking a production pipeline and rebuilding it the right way.
Part 1 covered what to store and how to retrieve it. Part 2 covers what breaks when real users arrive — and how production systems like Perplexity and ChatGPT are actually wired to handle it
Most RAG tutorials show you how to build something that works in a notebook. This one shows you what it takes to make it work when a real user shows up.
Most file upload code loads the entire file into RAM twice, once in the browser, once on the server. For small files nobody notices. For large files, your server dies silently. Here's the full picture from browser stream to S3.
Most developers reach for HTTP and call it microservices. But request-response, message queues, and event streaming are not the same thing they carry different guarantees, different failure modes, and different operational costs. Here's how to actually tell them apart, and when to use which.
Designing multi-tenant systems isn’t just about scaling, it’s about isolation, structure, and long-term maintainability. In this post, I break down how I built a schema-based multi-tenancy system using PostgreSQL and Supabase, with automated migrations, tenant isolation, and a reusable backend foundation.