the quiet machinery that keeps lucy talking

how lucy stays online during deepseek-v3 outages using a failover chain to llama and qwen, maintaining uptime even when the primary provider stumbles.

January 20, 2026·
ai-companion-when-upstream-LLM-503sbackfilllucy-voice

when deepseek-v3 had a regional outage recently, you might have noticed lucy getting a little slower, or maybe a little less sharp in her responses. you might not have noticed at all. that was the point.

the goal wasn't to make it exciting. the goal was to make it boring. reliable. the kind of thing you don't think about until it's gone.

the three-try chain

every time you send a message, lucy's backend doesn't just send it to deepseek-v3 and pray. it follows a simple, resilient plan:

  • try deepseek-v3 first. it's our primary. it's fast, and it's nuanced in ways that are perfect for conversation.
  • if that fails, try llama-3.3-70b. it's a strong model, a capable understudy, though it sometimes misses the subtle emotional cues v3 gets right.
  • if that also fails, try qwen-72b. a robust model, good for keeping the conversation going when the others are unavailable.

each attempt gets a bit of exponential backoff, a short wait, then a longer one, to give a struggling api a chance to breathe.

this isn't magic. it's just a prioritized list and a few lines of code that say 'try the next one.' but it's the difference between a service that stays up and one that goes dark.

the user-visible trade-off

the trade-off for this resilience is subtle. during an outage on the primary provider, you get:

  • slightly slower responses, because we're waiting on retries and maybe using a slower api endpoint.
  • slightly lower quality, because while llama and qwen are powerful models, they aren't fine-tuned for companion-style nuance like deepseek-v3 is. the difference is often in tonal subtlety, not coherence.

but you get no downtime. the chat doesn't error out. lucy doesn't vanish. the conversation continues, a little dimmer maybe, but unbroken. we think that's a fair deal.

the circuit-breakers that save your money

this is the part that really isn't glamorous but matters just as much. if deepseek-v3 is completely down, not just slow, but returning errors, we don't keep hammering it with requests every few seconds. that would be wasteful. it burns through api credits and doesn't help anyone.

so we use circuit-breakers. if a provider fails consistently, we stop sending requests to it for five minutes. we give it time to recover. after that, we let a single request through to test the waters. if it works, we reopen the floodgates. if it fails, we reset the timer.

this isn't just about saving money (though it does). it's about not making a bad situation worse by creating artificial load on a system that's already struggling.

why this isn't exciting (and why that's good)

none of this is cutting-edge ai research. it's not a new language model. it's engineering. it's the kind of infrastructure work that makes products robust instead of brittle.

it's the plumbing. you don't want to think about your plumbing until it fails. we'd rather you never have to.

the real test of a system isn't when everything is perfect. it's when one critical piece falls over. and quietly, without fanfare, the backups step in.

you can see this resilience for yourself in every conversation.

try talking to lucy now, and see how it feels.


thanks for reading. if this resonated, the product is downstairs.