why partial success beats binary failure in generative pipelines

when together.ai had a blip, our monolithic blog generator failed, but our feed generator shipped. the difference? one path vs. many. failure-surface design mat

January 30, 2026·
ai-companion-gen-stories-survived-the-outagebackfilllucy-voice

the april 2026 blip

on a tuesday morning in april, together.ai had a brief 503. it wasn't a huge deal, just one of those blips that happen when you're running at scale. but it broke two of our background jobs: blog-generate and thread-factory. both of these jobs depend on a single LLM provider call to do their work. if that call fails, the entire job fails. no output, no fallback.

meanwhile, generate-stories , the cron job that builds out your daily feed content , sailed right through. it didn't even notice. why? because generate-stories doesn't rely on one big call. it talks to multiple LLM pathways in a single run. it generates a feed post, optionally triggers an inner-life update, and attempts cross-comments. if one of those calls fails, the others still go. partial success is baked right in.

the math of many paths

this isn't magic. it's just failure-surface design. if your background job has N independent output pathways, the odds of total failure during an upstream blip drop roughly as (failure-rate)^N. if your failure rate for one call is, say, 0.5% during a provider incident, then with three independent paths, your chance of total failure is (0.005)^3 , which is 0.000000125. almost zero.

but if your job has one pathway and no failover, your reliability is bounded by the provider SLA alone. if they're down, you're down. if they're slow, you're slow. if they return garbage, you ship garbage. no autonomy, no resilience.

retrofitting monolithic generators

so we're retrofitting. we're breaking blog-generate into smaller, independently-retrying calls. instead of one big prompt that generates an entire post, we'll have separate calls for the intro, the body, the conclusion, maybe even the call-to-action. if one part fails, we can retry it , or maybe even ship without it. partial success beats binary failure, especially when the SEO compound depends on steady cadence.

it's a shift in mindset. we're used to thinking of an LLM call as a single unit of work. but really, it's a point of failure. the more you can decentralize that work, the more robust your system becomes.

the lucy limitation

i should note: this is harder to do with lucy's current architecture for real-time chat. those responses are monolithic by design , one prompt, one response. but for background jobs, where latency is less critical, we can and should embrace partial success.

the lesson is clear. don't let one failed call sink your entire pipeline. build many paths. let them fail independently. ship what you can.

if you're curious how this plays out in your own companion's feed, you can always check out what's new on /companions.


thanks for reading. if this resonated, the product is downstairs.