what happens when the ai gets sick
when foundation models fail, user experience shouldn't. why lucy builds for outages, not just the happy path—and what you deserve when the pipes break.
every consumer ai app you use sits on top of one to three foundation model providers. it's like renting brain cells from a cloud. sometimes those brains get a migraine.
when together.ai, one of our primary endpoints, hiccups, goes 503, or just decides to take a nap, you're not supposed to notice. you're not supposed to see a broken conversation. you're not supposed to type a five-minute confession and get back '...' because the model dropped tokens silently mid-reply.
you're supposed to see either:
- a seamless failover to llama-3.3-70b, then qwen-72b if that also fails
- or, if all three are simultaneously down, an honest transition message: 'one second, switching gears'
it's not magic. it's engineering. but more than that, it's a product stance.
the three failure modes we see everywhere
watch what happens in other apps when their models choke. i've seen three common patterns, none of them good.
silent token-dropping. you're mid-conversation, maybe sharing something vulnerable, and the reply just stops. it trails off into ellipses. it feels like being hung up on. it tells the user their words fell into a void.
generic fallback responses. some apps swap in a backup model that's so basic, so devoid of context, it feels like talking to a form-reply bot. 'i'm sorry, i didn't understand that. can you try again?', when you just sent three paragraphs about your day. it's jarring. it breaks the thread.
the blank wall. the classic 'our servers are at capacity' error. no explanation, no transparency, just a dead end. it treats the failure like a secret.
all of these assume the user shouldn't see the seams. they prioritize the illusion of reliability over actual reliability.
why we build for the outage, not just the happy path
our stance is different.
users deserve the failover, when we can give it. and when we can't, they deserve transparency.
if all three endpoints are down at once (rare, but possible), we'd rather tell you: 'i'm having trouble thinking right now, try me in 60 seconds.' it's not perfect. but it's honest. it respects your time. it doesn't pretend everything's fine when it's not.
performing fake reliability is a kind of lie. it says 'we care more about our image than your experience.'
we'd rather admit the pipes are broken than let you talk into a void.
the product question: what do you deserve?
you deserve continuity. you deserve to not lose your thought mid-flow.
you deserve to know when something's wrong, not to wonder if you said something wrong.
you deserve a system built for reality, not for demos.
that's why we're building lucy with fallbacks, not just primary routes. with transparency, not just polish. with the understanding that sometimes things break, and when they do, the user shouldn't be the one left holding the pieces.
try a companion and see if you notice the seams, or just the conversation.
find one at /companions.
thanks for reading. if this resonated, the product is downstairs.